home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-12-11 | 401.6 KB | 12,353 lines |
- 54.
- Subject: access times from backup not being reset
- Date: Thu, 16 Feb 89 14:44:35 PST
- From: Fred Douglis <douglis>
-
- I have a script that removes binaries from *.old (/sprite/cmds.*.old,
- etc.). It checks to see if the access time of a file is very old,
- because I wouldn't want to remove the old version of an installed
- command if the command was moved to .old only recently.
-
- All the files in *.old, and in my home directory, have been accessed
- in the past 2-3 days late at night. It looks like tar isn't resetting
- the access times when it dumps something. Since this is something we
- talked about, and my impression was that it was implemented, I thought
- I should mention the problem.
-
-
- 55.
- Subject: bug: portmap in debugger/nfsmount hung
- Date: Mon, 20 Feb 89 14:06:23 PST
- From: Fred Douglis <douglis>
-
- I wasn't going through the remote link to nfs properly, and I checked
- on oregano. portmap was in the debugger. I did a gcore on portmap
- (output in /tmp/portmap.core) in case anyone wants to look at it. I
- then found that restarting portmap wasn't sufficient, I had to kill
- and restart the nfsmount daemon I was interested in. This caused
- recovery to take place next time I tried accessing /sprite2. However,
- both before and after the problem occurred, I found that ls was
- printing "compat: invalid status 0xffffffff", and the only difference
- is that after I restarted nfsmount, it went on without a hitch even
- with the error message.
-
- 56.
- Date: Wed, 22 Feb 89 22:40:33 PST
- From: jhh (John H. Hartman)
- Subject: 0-length text segments
-
- Whenever paprika compiles a file it produces a garbage object that
- has a 0-length text segment. We were having this problem all
- afternoon and it suddenly occured to me that maybe one machine was
- sick. I put it into the debugger in case anyone wants to look at
- it. "Cat"ing one file into another seems to work, so I can't
- understand why only compiler output gets trashed.
-
- 57.
- Date: Sat, 25 Feb 89 12:10:34 PST
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: murder and thyme dance
-
- When I came in this today, murder and thyme appeared to be looping
- sending RPCs between each other.
-
- 58.
- Date: Mon, 27 Feb 89 08:58:00 PST
- From: ouster (John Ousterhout)
- Subject: Bug: mace crash (migration-related?)
-
- When I came in this morning and hit the first keystroke, Mace
- immediately entered the debugger. I got two messages in my syslog
- window: the first said "Evicting 1 processes", and the second said
- "Error 1 in SendProcessState" or something like that. Then the machine
- went into the debugger.
-
- 59.
- Date: Mon, 27 Feb 89 09:24:57 PST
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: Bug in brk compatiblity
-
- The brk() syscall call emulation doesn't shrink the heap segment when given
- an address less than the current end of heap. This causes programs that
- use brk() for allocating and freeing memory to grow without bound.
-
- 60.
- Subject: bug in new gcc?
- Date: Thu, 02 Mar 89 12:20:27 PST
- From: Fred Douglis <douglis>
-
- Trying to compile loadavg, I now get an error
-
- loadavg.c:68: initializer for floating value is not a floating constant
-
- This program used to compile just fine, and they sure look like
- floating constants to me!
-
- 61.
- Date: Sat, 4 Mar 89 16:24:18 PST
- From: ouster (John Ousterhout)
- Subject: Gdb bug
-
- If a program being debugged by Gdb exits, gdb prints the message
- "Program exited normally". But if I then quit from gdb, the process
- is left around in DEBUG state. Shouldn't gdb clean up this loose end?
-
-
- 62.
- Date: Mon, 6 Mar 89 13:50:30 PST
- From: ouster (John Ousterhout)
- Subject: Bug (lpd can't handle printer death)
-
- It appears that the lpd system is unable to deal with the death of
- a printer. I made the mistake of turning off my printer in the middle
- of a long printout, and when I turned the printer on again there was
- no way to print anything on it. I tried aborting and restarting the
- printer with lpc, but even that didn't shake things loose. The only
- thing I've been able to find that works is to reboot the machine. This
- seems to be repeatable. Bob, can you take a look? Mace is currently
- in the hung-printer state, if you have a chance to look at it before I
- need a printout and reboot.
-
- 63.
- Date: Tue, 7 Mar 89 16:49:48 PST
- From: jhh (John H. Hartman)
- Subject: prefix bug
-
- The following prefix input will cause a bus error:
-
- prefix -x /foo -M /bar
-
- 64.
- Date: Tue, 7 Mar 89 17:54:16 PST
- From: jhh (John H. Hartman)
- Subject: another prefix bug
-
- If you do something like
-
- prefix -x /foo -M /hosts/cayenne/dev/rsd0a
-
- you will put the kernel in the debugger.
-
- 65.
- Date: Thu, 9 Mar 89 16:43:52 PST
- From: mgbaker (Mary Gray Baker)
- Subject: xbiff problems?
-
- I'm just wondering if anyone else is still having problems with xbiff
- giving them the message "XIO: Unknown error."
-
- 66.
- Date: Mon, 13 Mar 89 09:33:50 PST
- From: ouster (John Ousterhout)
- Subject: Bug: mail duplication
-
- Has anyone else noticed duplication of mail messages? For example,
- I got two copies of my last message about large LocalFileIOHandles.
- I've also noticed this a few times in the recent past, including a
- message sent to a different distribution list than Sprite (so it
- can't be just a problem with the sprite distribution list). I'm
- not sure whether the problem is 100% reproducible.
-
- 67.
- Subject: bug: /initsprite is not machine-independent
- Date: Wed, 15 Mar 89 10:35:16 PST
- From: Fred Douglis <douglis>
-
- That is to say, when someone installed a new initsprite on March 8,
- sun2's stopped booting because initsprite is a sun-3 binary.
-
- 68.
- Date: Wed, 15 Mar 89 11:17:20 PST
- From: gibson (Garth Gibson)
- Subject: ls convention irregularity
-
- If a file in the local directory is a symbolic link to another directory,
- then ls -sF lists it as a directory (sufix is /) (ls -l shows it as a link).
- This differs from both vax and sun unix (which use the suffix @ for a
- symbolic link)
- If the local symbolic link points to a file then sprite conforms
- with unix (its suffix is @).
-
- 69.
- Subject: bug: tty should be like unix tty
- Date: Wed, 15 Mar 89 16:12:18 PST
- From: Fred Douglis <douglis>
-
- In BSD unix, one can say something like "rcp foo:bar `tty`" to copy to
- the terminal invoking the command. /dev/tty may be used similarly.
- In sprite, tty is a program to create a terminal driver with a
- pseudo-device, and /dev/tty doesn't exist.
-
- (Before anyone does anything about renaming tty, beware that some
- scripts may invoke tty. For example, /hosts/pride/bootcmds runs tty
- on /dev/console to make login use a terminal that understands control
- characters.)
-
- 70.
- Date: Thu, 16 Mar 89 11:31:57 PST
- From: ouster (John Ousterhout)
- Subject: Printing software broken?
-
- I'm no longer able to print on Mace's printer. Lpq prints this:
-
- Ready and printing.
- Rank Owner Job Files Total Size
- active ouster 23 (standard input) 14655 bytes
-
- but nothing happens. Can you take a look? I tried rebooting, thinking
- the kernel might be wedged, but that didn't solve the problem. I also
- tried power-cycling the printer; this also didn't help. Then I noticed
- that psdit seems to be looping infinitely. I tried a few test cases,
- including files that I KNOW printed a few days ago, but psdit always
- seems to get into an infinite loop. Printing still works OK for files
- that aren't coming from ditroff.
-
-
- 71.
- Date: Thu, 16 Mar 89 17:48:56 PST
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: /sprite/spool/mail/mendel corrupted.
-
- My incomming mail file (/sprite/spool/mail/mendel) appears to have been
- corrupted. I got a message from Susan Eggers that was inserted in the
- middle of the last message rather than appeaded to the file.
-
- 72.
- From: rab (Robert A. Bruce)
- Subject: make install bug
- Date: Thu, 16 Mar 89 18:28:33 PST
-
- When I run make install in either /a/adobecmds/* or /sprite/src/admin/*
- make tries to copy the previously installed executable to */sun3.md.old,
- but can't do it because the sun3.md.old directories don't exist.
- So I have to remove the currently installed program before it will
- install the new one.
-
- 73.
- Subject: bug: migrating X application hits negative refcount
- Date: Fri, 17 Mar 89 13:03:47 PST
- From: Fred Douglis <douglis>
-
- for example:
- % xman&
- % sleep a while
- % mig -p <xman_pid>
- your host goes into the debugger with a negative write count on the
- pdev stream. If continued, it will continue to enter the debugger
- with a complaint about unknown lclpdev. If the process is kill
- -KILLed on the other host, the home node may be continued without a
- problem.
-
- 74.
- Subject: bug: vm pagein/pageout errors and signals == deadlock
- Date: Tue, 21 Mar 89 12:32:51 PST
- From: Fred Douglis <douglis>
-
- Paprika hit a monitor deadlock when oregano crashed and rebooted. JHH
- and I chained through the processes and found that the following
- sequence of events took place: ...
-
- 75.
- Subject: update -l change
- Date: Wed, 22 Mar 89 16:51:59 PST
- From: Fred Douglis <douglis>
-
- I tried to install vm but hit a complaint from update about symbolic
- links. Did someone change kernel.mk recently to make it copy the
- files referenced to by symbolic links? Anyway, I had to remove the
- symbolic links before updating the files that had been changed, so it
- would install new files rather than complaining about the mismatch.
- (I'm sending mail so no one else wastes time tracking down the same
- problem.)
-
- furthermore, the complaint by update is misleading: it says that the
- source file is a real file when the target is a symbolic link, whereas
- in fact the source file is a symbolic link but it appears as a regular
- file to update because of the "-l" option.
-
- 76.
- Date: Wed, 22 Mar 89 21:11:13 PST
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: bug in reset command
-
- When I type reset from a vt100 terminal I get the message
-
- Cannot open /usr/lib/tabset/vt100
-
- 77.
- Date: Thu, 23 Mar 89 10:54:12 PST
- From: gibson (Garth Gibson)
- Subject: tx
-
- I just used the "~h" command in Mail in a tx shell that was "rsh"'d
- to pepper. The To: line contained alot of names. I heard a set of
- beeps and then the window became usless. I killed it and started
- a new one.
- garth
-
- 78.
- Date: Thu, 23 Mar 89 11:13:58 PST
- From: gibson@pepper.berkeley.edu (Garth Gibson)
- Subject: basil lockup
-
- A few minutes ago basil locked up. It had been running 6 days (that is
- all I remember about the kernel that was running). I had just completed
- a mail message in a tx window, rsh'd to pepper when the mouse froze.
- The spritemon continued, but L1-v etc did not generate output. Brent
- rsh'd in and found nothing interesting (Xsprite was OK). Finally I did
- L1-k which did get me control. Then C-c. I suppose I could have started
- a new X at this time, but instead I rebooted (22 Mar 89 18:19:35) kernel.
- garth
-
-
- 79.
- Date: Sun, 26 Mar 89 12:55:06 PST
- From: brent (Brent Welch)
- Subject: mace crash
-
- Mace died inside Mach_MonPutChar when printing a message
- about "[1] + Segmentation violation Xsprite\n". The
- error was inside the prom, I think, and was probably an
- address error of some sort. It ended up panicing three times
- on the way into the debugger, first from Mach_MonPutChar,
- then from IdleLoop() because I'll bet that interrupts were off,
- and then again inside Mach_MonPutChar as it tried again to
- print an error message.
-
-
- 80.
- Date: Tue, 28 Mar 89 09:17:07 PST
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: oregano and thyme in debugger
-
- When I came in this morning oregano was in the debugger with the message:
- Fatal Error: Fs_RpcStartMigration, unknown lclPdev handle <..>.
- and thyme was in the debugger with a bus error.
-
-
- 81.
- Date: Tue, 28 Mar 89 17:29:55 PST
- From: brent (Brent Welch)
- Subject: mint's ipServer died
-
- I think mint's ipServer died today when /sprite was filling.
- Mint swaps to /sprite and I'll bet the ipServer got a swap error.
- I've restarted the ipServer and currently there is plenty
- of disk space.
-
- 82.
- Date: Wed, 29 Mar 89 09:54:54 PST
- From: jhh (John H. Hartman)
- X-Mailer: Mail User's Shell (6.4 2/14/89)
- Subject: sage and mint dancing
-
- When I came in this morning (somehow I managed to be the first one
- here), sage and mint were in a recovery dance. Sage would complain
- about a stale file handle for /sprite/admin/migInfo, they would
- recover, and then sage would complain again. This went on every
- few seconds all night from the look of the pile of paper behind
- mint's console. Mint was complaining that there was no stream
- associated with the file. Has the recovery code been modified
- recently?
-
- 83.
- Subject: Re: mkmf bug (and more file rot)
- Date: Wed, 29 Mar 89 13:18:44 PST
- From: Fred Douglis <douglis>
-
- yes, that's exactly it. I was going to change it but couldn't check
- out mkmf.map because its RCS file is garbage. Looks like a line from
- the migInfo file at the start of the RCS file!! This either means
- recovery screwed up and let the wrong file get written, or the disk
- got trashed at some point.
-
-
- 84.
- Date: Wed, 29 Mar 89 17:42:26 PST
- From: ouster (John Ousterhout)
- Message-Id: <8903300142.AA334635@sprite.Berkeley.EDU>
- To: sprite
- Subject: Bogus messages
-
- Why do I keep getting syslog messages like these?
-
- Fs_NotifyWriter, bad handle
- Fs_NotifyWriter, bad handle
- Fs_NotifyWriter, bad handle
- Fs_NotifyWriter, bad handle
-
- Is this an over-conservative check that should simply be eliminated?
-
-
-
-
- 85.
- Subject: bug: oregano fs deadlock
- Date: Fri, 31 Mar 89 15:55:01 PST
- From: Fred Douglis <douglis>
-
- Oregano hung an rpc for me earlier today, then started wedging things
- left and right. I was able to debug it for a while before kgdb core
- dumped on me, then I gave up. The backtrace is in /tmp/oregano.where
- in case Brent wants to look at it -- it showed at least a couple of
- processes hung in Pfs stuff.
-
-
- 86.
- Date: Fri, 31 Mar 89 16:58:44 PST
- From: ouster (John Ousterhout)
- Subject: Pseudo-device buffering problem?
-
- Even with the new version of the tty driver, it appears to me that
- too much buffering is going in in the pdev implementation. For
- example, if I rlogin to Sprite using the new rlogind, cat a long
- file, and then type ^C, an awful lot more characters come out before
- the ^C takes effect. I tried reducing the size of the pdev buffer
- and the tty buffer, but this had no noticeable effect on the # of
- characters that come out before signals take effect.
-
-
- 87.
- Date: Sun, 2 Apr 89 17:35:46 PDT
- From: jhh (John H. Hartman)
- Subject: rlogin problem
-
- If I try to rlogin from unix and I decide not to login I can't kill
- the login prompt. '^D' doesn't seem to work.
-
-
- 88.
- Date: Mon, 3 Apr 89 16:39:09 PDT
- From: douglis (Fred Douglis)
- Subject: X (tx) bug: window on rebooted host hangs system
-
- I made the mistake of running tx on mint with the display on
- paprika, then trying to click in the tx window after mint had been rebooted.
- >From that point on, I couldn't get the input focus or do anything else; even
- xkill said it couldn't grab the mouse, so I couldn't kill the tx window. I
- finally had to restart X. Seems like we need a way for connections to rebooted
- hosts to be forcibly destroyed, and for them to time out when appropriate as
- well.
-
-
- 89.
- Subject: bug: "rsh host cmd" hits bus error
- Date: Mon, 03 Apr 89 17:52:03 PDT
- From: Fred Douglis <douglis>
-
- I can do "rsh xxx" but not "rsh xxx cmd" -- it hits a bus error.
- Seems the installed rsh is dated november, and there's an uninstalled
- one dated Mar 24. Can the uninstalled one be installed, so we can
- debug this problem if it persists?
- rsh with a command argument worked not too long ago.
-
-
- 90.
- Subject: bug: repeating device write
- Date: Tue, 04 Apr 89 02:35:59 PDT
- From: Fred Douglis <douglis>
-
- Several times today, a host has gottten into a funny situation in
- which it repeatedly wrote the same line someplace as the result of a
- single write operation. The first time, paprika's syslog printed the
- same SU message repeatedly, and Mendel and I looked at it but couldn't
- track down the problem, and it cleared itself up after we resumed.
- The second time, I believe it was oregano with the problem (also
- syslog), and the third time it was an rlogin from thyme to murder
- where the same line from a process running on murder kept getting
- written over and over. I threw thyme into the debugger on general
- principles, but I'm leaving now, so I don't know if this can be looked
- into. I'm reporting the bug so people know to be on the lookout, and
- maybe we can debug it sometime under more reasonable circumstances.
-
-
- 91.
- Date: Wed, 5 Apr 89 12:13:28 PDT
- From: douglis (Fred Douglis)
- Subject: bug: device recovery
-
- I had been catting /hosts/nutmeg/dev/syslog earlier, then after a
- reboot I got Recovery failed <1> (as usual) but this time hit subsequent
- errors:
- [thyme]/sprite/users/douglis (5)% cat /hosts/nutmeg/dev/syslog
- cat: read error: stale remote file handle
- [thyme]/sprite/users/douglis (6)% !!
- cat /hosts/nutmeg/dev/syslog
- /hosts/nutmeg/dev/syslog: invalid argument
- [thyme]/sprite/users/douglis (7)% !!
- cat /hosts/nutmeg/dev/syslog
- /hosts/nutmeg/dev/syslog: invalid argument
-
-
-
- 92.
- Date: Fri, 7 Apr 89 14:41:12 PDT
- From: douglis (Fred Douglis)
- Subject: problem with kmsg?
-
- %kmsg -v basil
- RecvReply: Error reading socket.
- Debug
- any idea what's up? I saw this yesterday too.
-
-
- 93.
- From: tve@ernie.Berkeley.EDU (Thorsten Von Eicken)
- Date: Sat Apr 8 00:27:09 PDT 1989
- Subject: sprite dies in Pdev
-
- I use the Pdev library. I can open the server side of a pdev, but as soon
- as I receive a client's open request, the server dies and takes the machine
- with it.
-
- I ran my program in the debugger. I get to PdevServiceRequest which calls
- my open service routine. The flags passed to the serive routine look
- very suspicious:
-
- (gdb) step
- ServOpen (cd=(ClientData) 0x0, f=(struct Pdev_Stream *) 0x26408, buff=(caddr_t)
- 0x25078 "\377\377\377\377", flags=4231170, proc=724277, host=13, user=2984, sel=
- (ClientData) 0xdfdfce8) (comm.c line 247)
-
- in my service routine, I determine I dislike the flags and return with EACCES.
- I get back into PdevServiceRequest (without changing the selectBits) which
- then calls ReplyNoData. The thing then dies in that function (I haven't
- traced more).
-
-
- 94.
- Subject: bug: rlogind infinite loop when userLog locked
- Date: Sat, 08 Apr 89 14:46:43 PDT
- From: Fred Douglis <douglis>
-
- Symptoms: user rlogins to sprite and exits; never returns to remote
- host. On sprite, rlogind is in the READY state much of the time.
-
- A backtrace showed rlogind in flock. Before calling flock, it sets up
- an interval timer to send SIGALRM in 10 seconds. gdb claims that the
- signal handler for SIGALRM is never called.
-
- I wound up just copying the userLog to another file and overwriting
- the original, to break the lock that was causing the problem. At
- least rlogind will work in the meantime. I'll continue to try to look
- into the problem. If anyone knows of any recent changes to signals,
- interval timers, or anything else that might account for this change
- in behavior, please let me know. (Recent == past few months.)
-
-
- 95.
- Subject: bug: rlogin ~^Z incompatible
- Date: Sun, 09 Apr 89 17:56:21 PDT
- From: Fred Douglis <douglis>
-
- Under unix, my understanding is that ~^Z stops the rlogin without
- output continuing from it, while ~^Y stops it but lets output
- continue. Under sprite, ~^Z causes output to continue, which can be
- pretty annoying....
-
-
- 96.
- Date: Tue, 11 Apr 89 20:15:24 PDT
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: tx bug
-
- Tx jumps into the debugger if you type the following command followed by
- a carriage return:
-
- ~brent/bin/read -help
-
-
- 97.
- Subject: copy-on-write crashes
- Date: Wed, 12 Apr 89 15:22:16 PDT
- From: Fred Douglis <douglis>
-
- Paprika has crashed twice in the past two days with the message:
-
- "COW: numCORPages < 0"
-
- This seems to happen when I fork children from emacs and then the
- parent emacs process exits. They all share a large address space,
- which is mostly untouched by the children (they're sitting around
- doing Fs_Dispatches). The children are exiting at just about
- the same time.
-
- It's not repeatable, or at least I don't know yet what might make it
- repeatable. Sorry. If anyone has any interest in pursuing the
- problem, or has any insight into what could cause it, please let me
- know. There's a kernel core dump in mendel's tmp directory on /b.
-
-
- 98.
- Date: Wed, 12 Apr 89 22:20:38 PDT
- From: gibson (Garth Gibson)
- Subject: vi problem
-
- I am logged in from home, editing a file on a sprite disk using vi.
- I wanted to do many instances of a simple change - search for last
- pattern, repeat last change. Of course, the screen redraw fell way
- behind. Then everything just stopped. Control C had no affect, neither
- did ESC or control L. I could ~~^Z back to unix and re-login to basil.
- Ps said the vi was in RWAIT. I looked at the file and it appeared quite
- old (I do periodic :w in vi out of paranoid habit).
- I will blow the process away and redo what I lost.
-
-
- 99.
- Date: Wed, 12 Apr 89 22:23:47 PDT
- From: gibson (Garth Gibson)
- Subject: vi problem revisited
-
- This may be an rlogin problem (overflow input buffer?). When I killed
- the process, "Killed" wa displayed and I got a new prompt, but all
- keystrokes were still having no effect. I'll kill the login.
- garth
-
-
- 100.
- Subject: bug: signal deadlock
- Date: Thu, 13 Apr 89 11:01:37 PDT
- From: Fred Douglis <douglis>
-
- I was running gdb under tx when I decided to restart the debuggee.
- The tx window went dead (no input, no menu highlighting, whatever).
- When I tried running programs from other windows, one by one they
- completed but didn't return to the shell. An l1-p showed they were in
- the exiting state, and it showed that there was a Proc_ServerProc and
- a csh waiting on the sig monitor lock. I couldn't find any other
- processes waiting on static locks (things I could find in an nm
- listing).
-
-
- 101.
- subject: interval timer bug (rlogind)
- Date: Fri, 14 Apr 89 00:48:20 PDT
- From: Fred Douglis <douglis>
-
- I noticed that the rlogind hanging bug had returned. I poked around
- in the kernel and discovered that the reason rlogind was ready so
- often, rather than waiting forever, was that it was getting signalled
- every 20 microseconds. This was due to a bug in procTimer.c that set
- an interval of <0,0> to <0,20> -- it would be correct to set 1
- microsecond to 20 (the minimum timer resolution), but not 0, which
- indicates the timer should only be hit once.
-
-
- 102.
- Date: Sat, 15 Apr 89 17:40:43 PDT
- From: mendel@sprite.Berkeley.EDU (Mendel Rosenblum)
- Subject: file system deadlock bug
-
- Sprite deadlocks when you try and umount a disk with the prefix command:
- prefix -U /local
-
- The deadlock is as follows:
-
- Fs_Command calls Fs_PrefixClear which graps the prefixLock monitor
- lock.
-
- Fs_PrefixClear calls FsPrefixHandleClose which also graps the
- prefixLock monitor lock.
-
-
- 103.
- Subject: bug: recovery affects pdev access times
- Date: Tue, 18 Apr 89 15:18:29 PDT
- From: Fred Douglis <douglis>
-
- When oregano rebooted a few minutes ago, apparently every active
- rlogin pseudo-device got reset. Therefore, a finger on sprite lists 5
- rlogin connections as having identical idle times (40 minutes or so,
- which is when oregano rebooted) and the only rlogins with different
- idle times are those that have been active in the past 40 minutes.
-
-
- 104.
- Subject: recovery bug
- Date: Mon, 24 Apr 89 12:56:43 PDT
- From: Fred Douglis <douglis>
-
- Paprika has been going through the following recovery loop for a
- while: it finds out mace is up, it finds some locked handles and
- prints GetNextHandle skipping this that and the other thing, it tries
- to recover something with mace and gets a timeout, and decides mint is
- dead:
-
-
- 105.
- Subject: bug: null object file
- Date: Mon, 24 Apr 89 15:57:24 PDT
- From: Fred Douglis <douglis>
-
- I just did a compilation and wound up with a .o file full of nulls.
- No idea whether it was done locally or via migration, or what might
- have caused this bizarre behavior. I compiled everything in a
- directory and the others are apparently okay (at least ld complained
- only about the next-to-last one it looked at). I'd be interested in
- hearing if anyone else notices this sort of behavior. Also, I looked
- very briefly in the sprite log to see if this had been reported before
- -- it seems slightly familar -- but I couldn't find anything under
- some obvious keywords.
-
-
- 106.
- Date: Mon, 24 Apr 89 17:47:34 PDT
- From: jhh (John H. Hartman)
- Subject: mx bug
-
-
- I typed "ESC F" (goto search string and delete what's there) and the entire
- mx window died with the following error:
-
- thyme<jhh 333> X Error: parameter mismatch
- Request Major code 42
- Request Minor code
- ResourceID 0xb00079
- Error Serial #905
- Current Serial #905
-
-
-
-
- 107.
- Subject: sendmail bug: mail stuck in queue
- Date: Fri, 28 Apr 89 16:14:07 PDT
- From: Fred Douglis <douglis>
-
- Mail to *.dec.com is apparently getting stuck in the mail queue. I
- confirmed with Mike that mail to mnelson%decwrl.dec.com@ginger got
- through, though mail from sprite is not. No reason why so far -- I
- haven't debugged sendmail -- but you might want to redirect your mail
- via a unix machine for the time being.
-
-
- 108.
- Subject: mint's ipserver died / disk full msgs
- Date: Sat, 29 Apr 89 11:18:56 PDT
- From: Fred Douglis <douglis>
-
- at 2:40am murder rebooted and mint printed out many messages about
- domain alloc failed. at the end, the printer wasn't keeping up, so
- messages were lost, possibly saying something about the ipserver, so I
- couldn't find out why the ipserver disappeared. The half-hourly
- message was printed at 3am, and immediately after that inetd
- complained about select errors and exited. I couldn't check ip.out
- because somewhere along the line "/hosts/mint/restartservers" got
- changed to overwrite ip.out rather than append to it, and the old
- version was lost before I got back downstairs to look at it.
-
- I don't know what to do about the ipserver's random skittishness, but
- I do have a suggestion about the console message problem: can the
- "Domain Alloc Failed" message be counted (and have a message about
- which domain it's talking about), so if the same message comes up many
- times, it only gets printed once before the domain empties again?
-
-
- 109.
- Date: Sat, 29 Apr 89 12:11:37 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug in fsmake
-
- The file system assumes that the disk label is copied to the first block
- of each partition. Fsmake doesn't do this.
-
-
- 110.
- Subject: fscheck causing extraneous reboots?
- Date: Sat, 29 Apr 89 14:22:01 PDT
- From: Fred Douglis <douglis>
-
- Is fscheck causing mint to reboot unnecessarily? I went to see why
- mint was taking so long to reboot (its RPC system wedged after some
- recovery error mendel had rebooting murder; debugging caused a
- watchdog reset before anything could be determined). It had rebooted
- after checking the root even though nothing was printed out about
- problems with the root. If the data block bitmap being different on
- disk is the only thing, is it necessary to shut down and reboot? (It
- didn't even complain about that, but it seemed like the likeliest
- problem.)
-
-
- 111.
- Date: Sat, 29 Apr 89 14:58:22 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8904292158.AA397601@sprite.Berkeley.EDU>
- To: sprite
- Subject: bug2 is fscheck
-
- fscheck writes the disk when it finds duplicate blocks even if the
- -write flag is not specified.
-
-
- 112.
- Date: Mon, 1 May 89 12:08:15 PDT
- From: jhh (John H. Hartman)
- Subject: thyme's ipserver died
-
- My ipserver died with a bus error in malloc(). It looks like it
- was trying to do a large allocation and the current memory pointer
- was bad. I don't really know because it wasn't linked with the
- debugging version of libc. I had a problem with the ipserver dieing
- because its timer callback queue was messed up. My guess is there
- is a wild pointer somewhere.
-
-
- 113.
- Subject: bug: sendmail zeroed memory
- Date: Tue, 02 May 89 10:42:35 PDT
- From: Fred Douglis <douglis>
-
- Sendmail occasionally goes into the debugger with a bus error trying
- to dereference a null pointer when rewriting addresses. Turns out
- some data structures that are normally initialized from the .cf file
- are all zeroed out. Unfortunately, I still don't have a recreatable
- test case, but I do know that the bug only seems to appear when
- sending mail to internet hosts that are probably not in the host table
- (i.e., a lengthy name server lookup may be required). Also, the
- sendmail process that hits a bus error is actually the child of a
- process that initialized the data structures, so it's conceivable (but
- unlikely) that the bug is in VM rather than in sendmail itself.
-
-
- 114.
- Date: Tue, 2 May 89 16:48:10 PDT
- From: douglis (Fred Douglis)
- Subject: bug: ipServer memory leak?
-
- For the past couple of days, just about any time I've used the internet
- from paprika (sending mail, printing files, etc) my system would hang up.
- I checked the ipServer and it had a resident set of almost 2 megs with
- a total memory image of 5 megs. paprika had been up since sometime
- over the weekend, I think. other hosts don't show enormous ipServers,
- but perhaps this is because I use unix X applications talking over TCP
- to my host, and because I've been printing things on paprika from Unix.
-
-
- 115.
- Subject: bug: locked sendmail files
- Date: Wed, 03 May 89 11:58:25 PDT
- From: Fred Douglis <douglis>
-
- I did a mailq and found a lot of locked files in the queue, dating
- back to this morning before mint rebooted. Anyone know anything about
- this?
-
-
- 116.
- Subject: bug: "swap down" error
- Date: Fri, 05 May 89 09:59:41 PDT
- From: Fred Douglis <douglis>
-
- I found that processes migrating to basil were getting stuck -- not
- running, not killable, nuttin'. I saw Garth wasn't around, so I threw
- basil into the debugger. (Sorry, Garth -- when I continued basil, it
- panicked with a complaint that "current process is nil" -- maybe kgdb
- didn't continue it properly after I changed processes?)
-
- The migrated process was stuck in an unkillable state because
- "swapDown" was set and it was waiting for someone to notify it that
- the swap area isn't down. Of course, we all know /c is just fine
- right now, so basil somehow got fairly confused.
-
-
- 117.
- Date: Fri, 5 May 89 23:27:25 PDT
- From: jhh (John H. Hartman)
- Subject: bugs in malloc()
-
- I ran a user level program that tries to malloc a giant piece of
- memory. Two problems occurred: 1) The call in MemChunkAlloc to
- sbrk failed (correctly) but MemChunkAlloc called panic. Shouldn't
- malloc return 0 rather than terminate the process? 2) Panic calls
- fprintf, which eventually calls StdioFileWriteProc. Since nothing
- has been written to stderr yet, StdioFileWriteProc calls (you
- guessed it) malloc to allocate a buffer. This is very bad. Stderr
- should not be buffered. If, however, we get rid of the call to
- panic both of these problems go away. Any comments?
-
-
- 118.
- Subject: bug: fs consistency hanging
- Date: Mon, 08 May 89 12:25:30 PDT
- From: Fred Douglis <douglis>
-
- Andreas reported that he couldn't get a login working, and it turned
- out that opens and stats of "~stolcke/.cshrc" were hanging. I
- debugged mint and found that everyone was waiting on a
- CONSIST_IN_PROGRESS that didn't seem to exist (I didn't find anyone
- actually in the middle of consistency). When I went to reboot mint, I
- saw something in its syslog about consistency with fenugreek for this
- file timing out, so it looks like somehow the flag didn't get reset
- properly. Furthermore, fenugreek was getting lots of timeouts
- followed by "fenugreek is up", which implies that maybe fenugreek's
- channels for communication with mint were hung up, perhaps by all the
- pdev-related operations Andreas was doing. (Mary is seeing many
- pdev-related syslog messages during recovery).
-
-
- 119.
- Date: Sun, 7 May 89 11:58:18 PDT
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: bug: pfs callback deadlocked oregano
-
- Oregano locked up sometime late yesterday or early today, with just
- about every process blocking on the prefixLock. Thanks to John's
- lock information, I was able to figure out which process was actually
- holding the prefix lock (a good case for leaving this information
- available *all the time*).
-
- Someone called Fs_PrefixDump, which locked the prefix monitor and called
- FsDomainInfo. This called FsPfsGetAttrPath, which called FsPseudoStreamLookup,
- which called RequestResponse.
-
- In the meantime, there were various other things, like reopens, going on.
- I couldn't figure out how to save my gdb window once it was going, so
- I can't provide a full backtrace, but the gist was that someone was
- trying to reopen a file and was blocked because the handle was locked;
- someone else was trying to delete something and was blocked on the handle,
- etc. I didn't pay much attention to this once I found the prefix table
- lock held down during the callback.
-
- One more bug, while I'm at it: saying "boot" without any arguments
- just hangs on oregano, and booting from ginger results in it shutting
- down and rebooting unsuccessfully from its local disk if there's an error
- on the root.
-
- 120.
- Subject: bug: missed notification on packet output?
- Date: Tue, 09 May 89 10:07:22 PDT
- From: Fred Douglis <douglis>
-
- Wei reported that a migrated process got wedged, and I found that it
- was stuck doing a remote write to its home machine -- the thing is, it
- was stuck in the low-level network code, waiting to be told a packet
- had been output, rather than in the RPC code as I had expected. I
- called Brent and he's looking at it, but I figured I'd record the bug
- to make sure it's on the bug list.
-
- 121.
- Subject: bug: lprm doesn't stop job in process
- Date: Tue, 09 May 89 11:16:39 PDT
- From: Fred Douglis <douglis>
-
- I accidentally sent a 35-page job to the printer when I meant to
- select only a page from it. When I did an lprm a moment later, it
- claimed to remove the job, but it came out nevertheless. I believe
- that in Unix I have been able to stop jobs even after they have
- started printing, and certainly before they start printing.
-
- 122.
- Date: Wed, 10 May 89 23:21:37 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: bug in DevNet_FsOpen
-
-
- Perhaps Mendel's new dev implementation fixes this, but I thought
- I'd better report it anyway. DevNet_FsOpen calls malloc with a
- semaphore held (protoMutex). If vmMonitorLock is held you go into
- the idle loop with the interrupts off. I guess this doesn't usually
- happen on a sun, but it just did on the spur.
-
-
-
- 123
- Date: Fri, 12 May 89 08:09:39 PDT
- From: douglis (Fred Douglis)
- Subject: migInfo file locked again (bug)
-
- something must have hung, been suspended, or been thrown into the
- debugger with the lock to the migInfo file held, because Wei sent me
- mail last night commenting that after relinking with the fixed
- node selection code, the time to select an idle host went to 10 seconds!
- I looked around for obvious candidates, didn't find any, and instead
- copied the file back to itself and restarted as many loadavg daemons
- as I could.
-
- Another case for using a server-based model instead of a single shared
- file, as far as I'm concerned.
-
-
- 124
- Date: Fri, 12 May 89 08:12:30 PDT
- From: douglis (Fred Douglis)
- Subject: bug: can't rlogin to mustard
-
- When restarting all the daemons, I found I couldn't rlogin to mustard.
- migrating to it works fine and lets me list the running processes, which
- include ipServer and inetd. Any ideas? It will be listed as "down" until
- someone kills the old loadavg and starts a new "loadavg -dv" process.
-
-
- 125
- Subject: bug: murder power-on-reset
- Date: Fri, 12 May 89 16:59:38 PDT
- From: Fred Douglis <douglis>
-
- Murder bit the big one earlier today when its ethernet cable popped
- out and then was reconnected. Is this a software fault or a problem
- with the hardware??
-
-
- 126
- Subject: bug: repeated obituaries
- Date: Mon, 15 May 89 21:26:49 PDT
- From: Fred Douglis <douglis>
-
- It's a little distracting to see "mace considered dead" once every
- minute or two. I can't imagine that the system thinks mace has gone
- from being alive to being dead, so there must be a bug that's causing
- it to say mace is considered dead when it's already dead. This may be
- tied to the fact that someone is probably trying to write over a
- pseudo-device to mace with some probability, once per minute.
-
-
- 127.
- Date: Mon, 15 May 89 21:52:01 PDT
- From: pmchen (Peter M. Chen)
- Subject: new user report
-
- Bugs:
- Before I got X running, I was using the console window:
- 1) more doesn't work
- TIOCLGET: invalid argument
- 2) vi doesn't work
- I vi'ed a file, then edited, then ctrl. Z, then foregrounded (%)
- When I foregrounded, the most recent change was gone. Also
- when I foregrounded, the screen paused until I hit a key.
- 3) set filec doesn't work (it does under tx and X)
-
- Once I got X running, life was much better. I still had some problems,
- though:
- 1) mouse movement is skewed (when I move the mouse vertically up,
- it goes at about a 10 degree angle to the right.
- 2) caps lock doesn't work (nor F1)
- 3) df prints out wrong information for nfs mounted file systems
- 4) "ls -F" lists symbolic links to directories as directories instead
- of symbolic links. E.g. ls /spur2/pmchen lists 262@ from unix
- but 262/ from sprite. This isn't necessarily a problem, but
- it is different from unix.
-
- Good things about sprite and tx:
- 1) tx looks nicer, and the fonts can be smaller with seemingly better
- resolution
- 2) vi printing response time seems faster under tx than xterm
- 3) my machine beeps (ctrl. G), which it never was able to do before (even
- under raw console)
- 4) tx is better at cutting and pasting than xterm
- 5) once you get X running, most things seem to work right away
-
- tx and uwm wish list:
- 1) xterm lights up the window that you're working in (in the title bar
- section. Can tx?
- 2) I'd like to save screen space and get rid of the command window
- and the "Control Search Selection" window. Why not use (as xterm)
- ctrl. mouse to get the Control, Search, and Selection?
- 3) xterm has a menu item to reset the terminal, which tx doesn't. It
- comes in real handy sometimes.
- 4) I'd like to be able to dynamically change the title of a tx window
- 5) I'd like to have deiconify warp
-
- Questions:
- 1) can I get named pipes (like the unix command mknod)?
- 2) how do you use xbiff? I see it in ~douglis/cmds.sun3, but I can't
- make it work
- 3) is there a proofer (such as xproof)?
- 4) is there an easy way to exit out of X?
- 5) are there common places to look for utilities and help without bugging
- you guys?
-
-
- 128.
- Date: Tue, 16 May 89 14:09:30 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: prefix bug
-
-
- This may be a feature, but I consider it a bug. Last night oregano
- would boot as root and export /a, /b, /c. In oregano's /local (which
- was serving as / ) there weren't any remote links for /a, /b, and
- /c. Oregano did not complain about this and on oregano I was able
- to cd to these directories. Other machines couldn't find them and
- would not boot. It took me a while to figure this one out. I don't
- think prefix should allow you to export a prefix that doesn't have
- a remote link. If you just want to change the name of a prefix on
- a particular machine you can do the same thing with a symbolic
- link, rather than the prefix command.
-
-
- 129.
- Subject: fs bug: bogus type
- Date: Wed, 17 May 89 14:42:04 PDT
- From: Fred Douglis <douglis>
-
- paprika crashed a short time ago with an address error resulting from
- Fs_GetAttributes calling a routine based on an invalid type (32). The
- core file is in /c/tmp/mendel/vmcore if that would be of use to Brent
- (please delete if not). Sounds like some checks for bogus types would
- be useful.
-
-
-
-
-
-
- 130.
-
- Date: Sun, 21 May 89 22:01:40 PDT
- From: douglis (Fred Douglis)
- Subject: bug: nawk & gawk incompatible
-
- gawk was installed, and nawk removed, but a script that works with nawk
- doesn't work with gawk. I believe it's because nawk allows variables to
- be defined on the command line. Check out ~douglis/bin/KernelVersions
- for an example of a command that produces no output using gawk.
-
- 131.
- Subject: bug (sort of): gcc & float
- Date: Mon, 22 May 89 00:06:19 PDT
- From: Fred Douglis <douglis>
-
- it seems that a number of programs that compile just fine under sunos
- using the std. cc produce incorrect code under gcc, due to the use of
- "float" v. "double". does anyone know whether other versions of the C
- library (pre-ANSI) use floats instead of doubles, or something?
- Andreas reported that "pic" produced bad code because of this, and now
- I found that ggraph produced a garbage graph under sprite, and has
- lots of use of floats. I also am starting to think my trouble with
- TeX is due to gcc v. whatever everyone else uses.
-
-
- 132.
- Subject: bug: non-ready process
- Date: Wed, 24 May 89 11:02:14 PDT
- From: Fred Douglis <douglis>
-
- paprika just crashed with a "non-ready process in ready queue",
- followed by a deadlock syncing the disks, followed by a deadlock on
- sched_Mutex, followed by aborting and requiring a watchdog reset to
- stop being comatose.
-
-
- 133.
- Subject: bug: tftpboot borken again
- Date: Thu, 25 May 89 11:27:16 PDT
- From: Fred Douglis <douglis>
-
- a kernel that runs fine from unix gets "exception 10" immediately
- after booting from mint.
-
-
- 134.
- Date: Sat, 3 Jun 89 16:09:14 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: bug in man macros
-
- The .VS macro starts sidebars, but the .VE doesn't seem to stop them.
- They continue to the end of the document.
-
-
- 135.
- Date: Mon, 5 Jun 89 17:48:11 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: migration bug
-
- Just for fun I decided to migrate a load of the spur kernel away
- from my host while it was running. I typed "mig -p <processid>"
- and it was migrated to mustard. When the load completed thyme
- thought the size of the resulting kernel was about 1 Mb, while the
- rest of the system including the fileserver thought it was its
- normal 4 Mb. I think the size on thyme is the size of the file at
- the time the migration occurred.
-
-
- 136.
- Date: Fri, 9 Jun 89 15:01:37 PDT
- From: douglis (Fred Douglis)
- Subject: bug: exclusive access to console when using debugger
-
- I was catting /hosts/sloth/dev/syslog when I tried to attach to sloth
- using kgdb. I had to interrupt the cat process and reattach in
- order to get through the attachment procedure. Before that, it just
- hung indefinitely.
-
-
- 137.
- Date: Fri, 9 Jun 89 15:12:51 PDT
- From: brent (Brent Welch)
- Subject: bug main_ variables
-
- We should clean up how the various main_ variables
- are declared and set. Now that we have Main_InitVars
- there is no reason to have:
- char *main_HomeDir = "/";
-
- /*
- * Flags to modify main's behavior. Can be changed without recompiling
- * by using adb to modify the binary.
- */
- Boolean main_Debug = FALSE; /* If TRUE then enter the debugger */
- Boolean main_DoProf = FALSE; /* If TRUE then start profiling */
- Boolean main_DoDumpInit = TRUE; /* If TRUE then initialize dump routines */
- int main_NumRpcServers = 2; /* # of rpc servers to create */
- char *main_AltInit = NULL; /* If non-null then contains name of
- * alternate init program to use. */
- Boolean main_AllowNMI = FALSE; /* If TRUE, allow non-maskable interrupts.*/
-
- like I do in my mainHook.c file
-
-
- 138.
- Subject: bug: when the disk fills ...
- Date: Tue, 13 Jun 89 11:44:11 PDT
- From: Fred Douglis <douglis>
-
- I know this has been brought up in the past, and I thought measures
- had been taken. If so, they weren't sufficient: when I filled up /a,
- my host became entirely unusable because it was printing "domain full"
- messages as quickly as it could (on the display because the syslog
- window couldn't keep up), and I couldn't get in to remove anything.
-
- How about associating a bit with each file that says whether it has
- been unsuccessfully flushed to disk? Each file could be printed out
- only once that way. The other thing is, when the disk fills up, the
- client could try waiting a while before flushing again. If the client
- can't do anything else in the meantime because its cache is full of
- dirty data, then it could wait rather than beating on the server while
- someone on another host tries deleting something.
-
- What ever happened to the idea of checking the available space before
- filling up the cache? Seems like there must be a better way to handle
- this, and we should deal with this before we put more people on the
- system.
-
-
- 139.
- Date: Fri, 16 Jun 89 13:19:22 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bugs in fscheck and boot sequence
-
- During the boot sequence, if the file .fscheck.out does not exists
- fscheck appears to write its output to root directory of the file system
- being checked. The only recover from this is remaking the file system.
- Fscheck doesn't appear to be able to fix a disk whose root directory
- was trashed.
-
- Also, the mkdir program should probably be added to /boot/cmds.
-
-
- 140.
- Date: Fri, 16 Jun 89 18:18:23 PDT
- From: douglis (Fred Douglis)
- Subject: bug oregano fscheck loop
-
- yet again, oregano would not reboot. apparently someone started it
- rebooting around 5:15 this afternoon without notifying anyone else and
- without sticking around to look at it; John O. and I wandered up there
- and saw it was rebooting, and left it alone untiL I decided it wasn'
- getting anywhere. When I rebooted single-user, there were a few problems
- (like the $path wasn't set up to execute anything!) but i was able
- to attach /c and see that lost+found was full again. I tried
- creating and deleting lots of files, getting the size of the directory
- up o 16K and that still wasn't enough. i finally gave up and rebooted
- with a fastboot, so * /c still has not been checked *. this was after
- 3 or 4 attempts to get fscheck to complete without filling up lost+found.
-
-
- 141.
- Date: Sat, 17 Jun 89 13:27:08 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug is fscheck
-
- The -hostID option of fscheck should allow the user to specify the hostID to
- set in the disk header. I made tonkawa's disk on murder so the hostID was
- set to 17. When I booted tonkawa it would initialized its hostID from the
- disk so I couldn't change it. I had to L1-a tonkawa during the boot,
- set rpc_SpriteID to 15 from the PROM, continue the boot, and run fscheck.
-
-
- 142.
- Date: Mon, 19 Jun 89 21:39:06 PDT
- From: brent (Brent Welch)
- Subject: SendTimerSigFunc bug?
-
- Mendel had complained that the timer queue was filling up
- in the new kernels. I did some debugging and noticed
- many entries due to SendTimerSigFunc, which is used for
- process interval timers. There is a level of indirection
- that must be followed to see this. SendTimerSigFunc is
- called from CallFuncFromTimer, which is the function in
- the timer queue. Anyway, it looks like some process is
- either way overusing the interval timer stuff, or some
- recent change has broken it and the timer reschedules
- itself incorrectly.
-
-
- 143.
- Date: Tue, 20 Jun 89 22:49:11 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: file permissions bug
-
- If I try to chmod /sprite/src/kernel I get :
-
- thyme-3# chmod 775 /sprite/src/kernel
- chmod: /sprite/src/kernel: too many levels of symbolic links
-
- Also, the file LOCK.make existed in /sprite/src/kernel and was owned by me :
-
- -rw-rw-r-- 1 jhh 0 Jun 2 14:28 LOCK.make
-
- but I could not delete it :
-
- rm LOCK.make
- rm: LOCK.make: permission denied
-
-
- 144.
- Date: Wed, 21 Jun 89 17:40:18 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8906220040.AA69899@sprite.Berkeley.EDU>
- To: sprite
- Subject: prefix bug
-
- I had /sprite/src/kernel attached to murder under both /sprite/src/kernel
- and /d. If you type
- cd /sprite/src/kernel/dev
- pwd
-
- look get /d/dev as output. This breaks mkmf.
-
-
- 145.
- Date: Wed, 21 Jun 89 18:12:19 PDT
- From: douglis (Mary Gray Baker)
- Message-Id: <8906220112.AA733452@sprite.Berkeley.EDU>
- To: sprite
- Subject: bug in Vm_FindCode
-
- I got into a mode where any process trying to execute "sh" would hang
- in an unkillable state. This is because FindCode thinks someone else is already
- trying
- to allocate the segment, and it waits on a condition that never gets notified.
- Seems like this isn't an awfully high priority problem, but something worth
- thinking about...
-
-
- 146.
- Subject: bug with syslog
- Date: Thu, 22 Jun 89 12:32:44 PDT
- From: Fred Douglis <douglis>
-
- maybe related to the new changes in dev? the newer kernels get
- screwed up and only direct some output to the process that's catting
- /dev/syslog, with the rest going directly to the display.
-
-
-
-
-
- 147.
-
- Date: Thu, 22 Jun 89 17:13:35 PDT
- From: brent (Brent Welch)
- Subject: device reopen bug
-
- I have tested device reopening and it is ready to go,
- except that there is an obscure bug which I don't want
- to fix right now. The bug would only show up if you
- have a write-only stream to a remote syslog device,
- and the remote host reboots. Upon reopen the syslog
- device would erroneously be told the client has a
- read-write, not write-only, stream. This would confuse the
- syslog device because it is a single-reader device.
- (To fix this you'd have to close the write-only stream
- and reboot the server.)
-
-
- 148.
- Date: Fri, 23 Jun 89 14:14:13 PDT
- From: stolcke (Andreas Stolcke)
- Subject: spritemon
-
- When I tried to run spritemon recently on mint it gave me
- Floating-point exception. I was rlogged in from a non-sprite sun4,
- but I don't see how that could have something to do with it.
-
-
- 149.
- Date: Sat, 24 Jun 89 15:54:44 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: tx bug
-
- It looks like tx windows are missing some refresh events. If I
- change the window under a dialog box (like the one that says I
- can't write the file), and then pick "continue", the underlying
- window is not refreshed.
-
-
- 150.
- Date: Sat, 24 Jun 89 18:16:47 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8906250116.AA397619@sprite.Berkeley.EDU>
- To: sprite
- Subject: recovery bug
-
- Every 30 seconds murder prints a message
-
- 6/24/89 17:17:31 basil (5) completed recovery
-
- in its syslog.
- Murder is running:
- SPRITE VERSION 1.0 (Brent sun3) (23 Jun 89 13:03:36)
- and basil is running
- SPRITE VERSION 1.0 (Brent sun3) (14 Jun 89 17:42:58
-
-
- 151.
- Date: Sun, 25 Jun 89 15:07:10 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: spur vm bug
-
- On line 1476 in Vm_SegmentDup there is an unlock of the page pointed
- to by the destination PTE ptr. Unfortunately this page was not locked
- in the first place. Vm_SegmentDup was called by InitUserProc. I looked
- all through the vm code and was unable to find the place where the
- destination page is locked. Obviously this can't be the case, otherwise
- the code would never work. Could someone who understands the code
- better take a look at it and tell me where the page is locked?
-
-
- 152.
- Date: Sun, 25 Jun 89 18:25:28 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: netroute bug
-
- I can't install a route to tonkawa because rarp fails. I don't know
- why rosemary refuses to answer rarp requests, but it would be nice
- if I could specify the internet address of the host to netroute,
- or have netroute look in spritehosts. Also, what's the deal on the
- rarp daemon? Are our fileservers supposed to be running it, or do we
- depend on unix. Raid had a problem booting because no one responded to
- rarp. When I started the daemon on tonkawa the problem went away.
-
-
-
- 153.
- Date: Thu, 29 Jun 89 17:19:14 PDT
- From: ouster (John Ousterhout)
- Subject: Pmake bug?
-
- If I type "pmake cleanall" in /a/X/src/cmds/Xsprite, pmake hangs after
- printing the following information:
-
- mace: pmake cleanall
- --- cleansun2 ---
- pmake -l 'CC=cc' 'INSTALLDIR=/X/cmds' 'TM=sun3' TM=sun2 clean
- --- tidy ---
- %%% ddx %%%
- --- clean ---
- rm -f sun2.md/spriteBW2.o sun2.md/spriteCG2M.o sun2.md/spriteCursor.o sun2.md/sp
- riteGC.o sun2.md/spriteInit.o sun2.md/spriteIo.o sun2.md/spriteKbd.o sun2.md/spr
- iteMouse.o sun2.md/spriteBW2.po sun2.md/spriteCG2M.po sun2.md/spriteCursor.po su
- n2.md/spriteGC.po sun2.md/spriteInit.po sun2.md/spriteIo.po sun2.md/spriteKbd.po
- sun2.md/spriteMouse.po sun2.md/linked.o sun2.md/linked.po *~ sun2.md/*~
-
- Control-C will unwedge Pmake, but the hang seems to be repeatable (i.e.
- there's no way to get "pmake cleanall" or "pmake clean" to complete).
-
-
- 154.
- Date: Thu, 29 Jun 89 18:04:36 PDT
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: bug: mint crash in Fs_PrefixDump
-
- Mint crashed with a bus error. because we don't keep sources under unix,
- i wasn't able to find out much about what was going on other than
- a backtrace and a local variable list. I dumped *prefixPtr and it
- was garbage (list pointing to 1 and 4 instead of normal addresses, and so
- on). This happened right after oregano had its problems with prefix-related
- operations hanging after the ipServer died.
-
- I rebooted mint with my new kernel, which I will copy over to rosemary as
- soon as mint comes back. (Mint had been running the JHH kernel, which has
- who-knows-what in it; my kernel has the installed everything except the
- new change for the process timer free() bug, which would have eventually
- crashed mint in any other kernel.)
-
-
- 155.
- Subject: bug: ipServer looping?
- Date: Thu, 29 Jun 89 23:21:54 PDT
- From: Fred Douglis <douglis>
-
- I'm getting pretty awful response when logged in from home, and I
- noticed that the 5-minute load average is over 1 although there are
- none of the usual suspects (cc's and whatever) around. However, the
- ipServer seems to be in the READY state all or most of the time, at
- least while I am logged in. Has anyone else noticed this behavior?
-
-
- 156.
- Subject: bug: pmake messed up big time
- Date: Fri, 30 Jun 89 19:12:00 PDT
- From: Fred Douglis <douglis>
-
- see anything funny with this?
- cd /sprite/src/lib/c/mig/
- pmake -k debug
- --- sun3.md/Mig_ConfirmIdle.go ---
- rm -f sun3.md/Mig_ConfirmIdle.go
- cc -O -msun3 -I. -Isun3.md -g -c Mig_ConfirmIdle.c -o sun3.md/Mig_Confirm
- Idle.go
- --- ../sun3.md/libc_g.a ---
- ar r ../sun3.md/libc_g.a sun3.md/Mig_ConfirmIdle.go
- ar: filename Mig_ConfirmIdle.go truncated to Mig_ConfirmIdle
- /sprite/cmds.sun3/ranlib ../sun3.md/libc_g.a
- --> rm -rf sun3.md/Mig/sprite/cmds.sun3/ranlib ../sun3.md/libc_g.a
- rm -rf sun3.md/MigAsciiToInternal.go sun3.md/MigGetLocalName.go sun3.md/MigI
- nternalToAscii.go sun3.md/Mig_ConfirmIdle.go sun3.md/Mig_Done.go sun3.md/Mig_Get
- AllInfo.go sun3.md/Mig_GetIdleNode.go sun3.md/Mig_GetInfo.go sun3.md/Mig_OpenInf
- o.go sun3.md/Mig_UpdateInfo.go
-
-
- 157.
- Date: Sun, 2 Jul 89 18:51:12 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: redirect bug
-
- If I try to move /tmp/goo to /a/attcmds/csh/sun4.md/csh, the sun3.new kernel
- crashes with a VmRawAlloc out of memory bug. It is dying in FsLookupRedirect
- at line 564 with a prefixLength that is total garbage.
-
-
- 158.
- From: rab (Robert A. Bruce)
- Subject: bug: makedepend
- Date: Mon, 03 Jul 89 23:43:44 PDT
-
- makedepend apparently goes into an infinite loop when I run mkmf
- in /a/newcmds/cc1.68k.
-
-
- 159.
- Date: Fri, 7 Jul 89 21:44:37 PDT
- From: douglis (Fred Douglis)
- Subject: bug: oregano died with leftover indirect block
-
- yet again. it was down for over an hour, including the time needed
- to check its disks when I rebooted.
-
- Mint crashed with the same complaint earlier today.
-
-
- 160.
- Date: Sun, 9 Jul 89 22:04:09 PDT
- From: brent (Brent Welch)
- Subject: pmake sun4 TM bug
-
- pmake on a sun4 doesn't default to TM=sun4 correctly, it defaults to sun3.
- However, on the plus side, I was able to compile and install a
- working rshd from anise for the sun4s.
-
-
- 161.
- Date: Mon, 10 Jul 89 13:30:21 PDT
- From: douglis (Fred Douglis)
- Subject: bug: FsRemoteDomainInfo: waiting for recovery
-
- this should probably time out instead of waiting for recovery. Otherwise,.
- it seems that a down host can cause all operations involving the prefix
- table to hang indefinitely, including anything one might try to remove the
- offending entry in the first place.
-
-
- 162.
- Date: Mon, 10 Jul 89 18:25:49 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: assembler bug for sun4 assembling
-
- The sprite (gnu) assembler calls abort() when it sees a load or store
- instruction to an alternate space. This means I can't assemble most of the
- sun4 kernel assembly code since it's got a lot of loads and stores to
- control space, etc.
-
-
- 163.
- Date: Tue, 11 Jul 89 18:32:41 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: ld bug for linking sun4 stuff
-
- The linker gets a segmentation violation when I try to link my sun4 kernel.
- There could certainly be something wrong with the obj's I'm trying to link,
- but what the debugger is saying makes no sense.
-
-
- 164.
- Subject: bug: ggraph broken
- Date: Wed, 12 Jul 89 01:53:58 PDT
- From: Fred Douglis <douglis>
-
- the installed version gave me a bizarre line on an input file that
- generated a good graph on unix. remembering andreas's comment about
- floats and doubles in gcc, i tried recompiling after changing all
- floats to doubles, but this time i hit a bus error running ggraph.
-
-
- 165.
- Date: Wed, 12 Jul 89 12:43:39 PDT
- From: pmchen@sprite.Berkeley.EDU (Peter M. Chen)
- Subject: bug report on gettimeofday
-
- I seem to be going backwards in time once in a while. The following is a
- trace of my program.
-
- tp.tv_sec=616275411, tp.tv_usec=910000
- tp.tv_sec=616275410, tp.tv_usec=960000
-
- Note that in the last line, tv_sec has gone backwards one second. This
- seems to be consistent on tv_usec = 960000, but not every time. For example,
-
-
-
-
- 166.
- Date: Thu, 13 Jul 89 12:30:55 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: mx bug
-
- My mx window appeared to go into an infinite loop. It was in
- CharToLine, and the increment was flipping between -1 and 1. There
- is a core in my home directory named core.91a34 if someone wants
- to look at it.
-
-
- 167.
- Subject: bug? /hosts protections
- Date: Thu, 13 Jul 89 15:05:03 PDT
- From: Fred Douglis <douglis>
-
- Just about all the /hosts/*.EDU directories are mode 777. Anyone know
- why this is the case? Makes /hosts/.../nologin a bit of a problem.
-
-
- 168,
- Subject: bug: setpriority() not implemented
- Date: Thu, 13 Jul 89 17:13:03 PDT
- From: Fred Douglis <douglis>
-
- Garth, just a warning if you should use sprite for your simulations.
- the unix setpriority() call just returns success without doing
- anything. I think this may be because unix and sprite priorities are
- implemented differently. In sprite, a priority of "-1" means double
- all charged usage, while "-2" means quadruple it, and so on. Since
- unix priorities are linear instead of exponential, someone could have
- undesired consequences if he used two different unix priorities in one
- way and in sprite the relative difference was greater.
-
-
- 169.
- Date: Thu, 13 Jul 89 23:59:38 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: pmake hanging bug
-
- Different parts of pmakes keep hanging randomly. If I kill and restart the
- pmake, it usually goes just fine. Perhaps it has to do with the choice of
- machines, since when it's restarted it usually gets a different machine.
-
-
- 170.
- Date: Fri, 14 Jul 89 12:11:10 PDT
- From: pmchen@sprite.Berkeley.EDU (Peter M. Chen)
- Message-Id: <8907141911.AA76841@sprite.Berkeley.EDU>
- To: /sprite/users/pmchen/mail/sprite/mbox, sprite@sprite.Berkeley.EDU
- Subject: su-suspend bug
-
- If I su, suspend the process, then fg it, the su process ends. Am I doing
- this wrong (ie. do I need to do this differently than on UNIX)?
-
-
- 171.
- Date: Fri, 14 Jul 89 13:43:16 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Another gcc bug
-
- Gcc seems only to look at the size of a structure before determining whether
- to use byte, half-word or whole-word loads and stores for structure assignment.
- This doesn't take into account alignment of the structue. The following code
- seg faults because it attempts to do half-word loads and stores on an odd
- boundary.
-
-
- 172.
- Date: Fri, 14 Jul 89 14:35:32 PDT
- From: mgbaker (Mary Gray Baker)
- Message-Id: <8907142135.AA133167@sprite.Berkeley.EDU>
- To: sprite
- Subject: Gcc alignment bug
-
- Okay, MAYBE this isn't really a bug, but it sure would be nice if things
- were aligned or at least sized so that they would be aligned. The initialized
- odd-length string here causes the following initialized structure to be on an
- odd byte boundary. This happens in about 5 or 6 places in the kernel and
- causes all sorts of havoc when combined with the gcc bug that does loads and
- stores based on the size of a structure regardless of its alignment.
-
-
- 173.
- Date: Fri, 14 Jul 89 18:23:10 PDT
- From: mendel (Mendel Rosenblum)
- Subject: printing to lw477 broken
-
- When I try to print to lw477 I get the message:
- <51>Jul 14 18:15:21 lpd[c1139]: lw477: ioctl(TIOCLBIS): invalid argument
- but no output.
-
-
- 174.
- Subject: race condition bug w/ migration
- Date: Fri, 14 Jul 89 19:06:30 PDT
- From: Fred Douglis <douglis>
-
- doing many migrations in parallel seems to cause "non-ready process in
- ready queue" on an infrequent basis. the non-ready process has the
- state PROC_EXITING but a backtrace indicates it thinks it should be
- waiting for an RPC. i'll look into this ASAP.
-
-
- 175.
- Date: Sun, 16 Jul 89 02:30:35 PDT
- From: eklee (Edward K. Lee)
- Subject: possible tx geometry bug
-
- Executing "tx =NxM+X+Y" results in a tx window with only M-1 rows.
- However, executing "geometry =NxM+X+Y" from tx does give you M rows.
-
-
- 176.
- Subject: bug: a.out.c out of date?
- Date: Mon, 17 Jul 89 11:23:41 PDT
- From: Fred Douglis <douglis>
-
- There are references to Aout_PageSize that appear to subscript into
- the array based on M_SPARC while Aout_PageSize is only set up for
- M_68020. The source file is /sprite/src/lib/c/etc/a.out.c.
-
-
- 177.
- Subject: bug: full kernel build disk
- Date: Mon, 17 Jul 89 17:54:35 PDT
- From: Fred Douglis <douglis>
-
- oregano hung up again when Mendel tried to remove something from
- /sprite/src/kernel and it was full. I was able to free up a large
- chunk of space without getting hung, somehow -- I removed
- /sprite/src/kernel/sprite/sun3.{old,23Jun...}.
-
-
- 178.
- Subject: bug: tftpd causing lingering kernel lost+found files
- Date: Tue, 18 Jul 89 12:03:43 PDT
- From: Fred Douglis <douglis>
-
- /sprite/src/kernel had 75 megabytes in lost+found, so I tried to
- remove the files. They were almost all mgbaker kernels. After
- removing them, the disk space didn't get reclaimed. I poked around a
- bit and eventually found that mint has about 20-30 tftpd processes
- lying around. I think they must have open handles on the sun4 kernel
- files. Do we have a tftpd maintainer in the house?
-
-
- 179.
- Subject: bug: lost+found reference counts
- Date: Tue, 18 Jul 89 14:19:38 PDT
- From: Fred Douglis <douglis>
-
- some of these are clearly bogus:
- drwxrwxr-x 0 root wheel 8192 Jul 7 15:50 /a/lost+found
- drwxrwxr-x -3 root wheel 8192 Jul 7 15:50 /b/lost+found
- drwxrwxr-x 2 root wheel 16384 Jul 8 18:03 /c/lost+found
- drwxrwxr-x -1 root wheel 8192 Jul 12 17:47 /sprite/lost+found
- drwxrwxr-x 2 root sprite 5
-
-
- 180.
- From: rab (Robert A. Bruce)
- Subject: read error
- Date: Thu, 20 Jul 89 07:02:45 PDT
-
- The dump program crashed last night after getting a read
- error on the file /sprite/spool/mail/mgbaker. The error
- occured at byte offset 51200.
-
-
-
- 181.
- Date: Thu, 20 Jul 89 23:47:35 PDT
- From: shirriff (Ken Shirriff)
- Subject: Mail got messed up
-
- One of the messages in my mail file seems to have got messed up somehow.
- For some reason, 12 lines of Tex appeared in my mail file:
-
-
- 182.
- Subject: bug: nfs symbolic links incompatible
- Date: Thu, 20 Jul 89 23:58:42 PDT
- From: Fred Douglis <douglis>
-
- I made a set of symbolic links on /rosemary/spare, running on sprite,
- and then tried to reference them from dill (running ultrix). It
- complained they were invalid. rosemary also misbehaved, though in
- rosemary's case "cat foo" would list the name of the file foo points
- to, as though it weren't a symbolic link and the contents were being
- printed. sprite acted like ultrix:
-
- paprika% ln -s foo bar
- paprika% cat bar
- bar: invalid argument
-
- I removed the links on dill and recreated them running on dill. This
- time they worked. The resulting links were readable by all hosts.
- Is this a case of sprite and unix having inconsistent sizes (relating
- to the trailing null character, maybe)?
-
-
- 183.
- Subject: bug with kernel idle time var.
- Date: Fri, 21 Jul 89 13:09:19 PDT
- From: Fred Douglis <douglis>
-
- there used to be a special check to only update the idle time on
- keyboard or mouse input. looks like now serialB updates it too, so
- printing causes eviction.
-
-
- 184.
- Subject: bug making libraries
- Date: Sat, 22 Jul 89 14:15:35 PDT
- From: Fred Douglis <douglis>
-
- I am trying to create libX11.a for the ds3100. When I went into the
- source directory and did a pmake, it made all the object files but
- produced a lot of empty "ar r" lines that didn't actually replace the
- object files or remove them. In some cases they actually were added
- to the archive, but not usually, and i don't see a pattern explaining
- why it only happened some times.
-
- "pmake -n" listed a bunch of commands to do the actual "ar" commands,
- but "pmake" by itself did the empty "ar" commands again. I finally
- broke down and am doing a single "ar ... */ds3100.md/*.o" from the
- shell.
-
-
- 185.
- Date: Sun, 23 Jul 89 11:43:10 PDT
- From: mendel (Mendel Rosenblum)
- Subject: Can't start X without ipServer
-
- Xsprite jumps into the debugger when it is started and the ipServer is not
- running. No message is produced, xinit just hangs.
-
-
- 186.
- Subject: bug: lpd repeatedly restarting
- Date: Sun, 23 Jul 89 20:20:59 PDT
- From: Fred Douglis <douglis>
-
- with the new serial line driver, when lw477 ran out of paper, I get
- messages saying things like
-
- <54>Jul 23 20:19:29 lpd[50b39]: restarting lw477
- Warning: receiver overrun on serialB
- Warning: receiver overrun on serialB
- Warning: receiver overrun on serialB
- <54>Jul 23 ... lpd[50b39]: restarting lw477
-
- i don't believe i ever saw this behavior using the old kernel.
-
-
- 187.
- Date: Fri, 21 Jul 89 09:18:01 PDT
- From: ouster (John Ousterhout)
- Message-Id: <8907211618.AA138019@sprite.Berkeley.EDU>
- To: sprite
- Subject: Bug: crash during boot
-
- Mace crashed twice in a row while booting "sun3.ouster" this morning.
- The crash happened just after messages appeared on the console about
- initiating recovery, relatively early in the boot process. Here's
- some information f
- rom Kgdb:
-
- Stack:
- #0 0xe0575b0 in Timer_ScheduleRoutine (newElementPtr=(Timer_QueueElement *) 0xe
- 07c090, interval=1) (timerQueue.c line 374)
- #1 0xe04d9da in RpcDaemonWait (queueEntryPtr=(Timer_QueueElement *) 0xe07c090)
- (rpcDaemon.c line 418)
- #2 0xe04d3f6 in Rpc_Daemon () (rpcDaemon.c line 109)
- #3 0xe0523c0 in Sched_StartKernProc (func=(void (*)()) 0xe04d3b8) (schedule.c li
- ne 839)
-
- At this point in the code, itemPtr was 0xffffffff, and I found a bogus
- element at the end of the timer queue. The contents of the element were:
-
- (links = (prevPtr = 0xffffffff, nextPtr = 0xffffffff), routine = 0xe04da8a, time
- = (seconds = 16, microseconds = 330000), clientData = 0xffffffff, processed = 0
- , interval = 2000)
-
- The "routine" was pointing to Rpc_DaemonWakeup.
-
-
- 188.
- Date: Sun, 23 Jul 89 21:48:11 PDT
- From: ouster (John Ousterhout)
- Subject: Adding a new ds3100
-
- This one is for the bug list: I suggest that we should modify our
- version of bootp to read /etc/spritehosts, so that it isn't necessary
- to modify /etc/bootptab whenever new hosts are added.
-
-
- 189.
- Date: Mon, 24 Jul 89 13:52:07 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug in fsstat output
-
- The Internal fragmentation statistics from the fsstat command are totally
- bogus. I've fixed the bug in the kernel routine Fs_CheckFragmentation that
- caused this problem.
-
-
- 190.
- Date: Mon, 24 Jul 89 18:16:18 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug in timing on ds3100
-
- The csh time command givens bogus numbers on the ds3100 running Sprite. The
- CPU time is greater than the wall clock time. For example:
-
- pride% cat direntires* | awk -f a > /dev/null
- 186.5u 12.3s 1:27 227% 0+0k 0+0io 0pf+0w 212+5101csw
-
-
- 191.
- Date: Tue, 25 Jul 89 08:57:22 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug in bootp
-
- The bootp deamon goes into an infinite CPU loop if you kill the ipServer.
-
-
- 192.
- Date: Tue, 25 Jul 89 09:51:26 PDT
- From: mendel (Mendel Rosenblum)
- Subject: ipServer on mint died
-
- When I came in this morning the ipServer on mint was in the debugger. It
- died in malloc() with a segmentation fault because the large memory
- pool free list was corrupted. I couldn't figure what caused the problem
- but the memory near the corrupted pointer contained the string
- "Copyright (C) 1989 Digital Equipment Corporation." I think it might of
- just choked on this :-)
-
-
- 193.
- Date: Tue, 25 Jul 89 14:30:31 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: sun3.new broken
-
- I tried to boot sun3.new on mint and fscheck failed because it couldn't
- read /dev/rxy0a. Did something change in the dev module?
-
-
- 194.
- Subject: differences between ansi C and DEC C
- Date: Tue, 25 Jul 89 16:15:53 PDT
- From: Fred Douglis <douglis>
-
- I'm running into a lot of trouble porting certain programs to sprite,
- because the ultrix compiler doesn't understand the same things. For
- example, in diff, "void *" causes headaches, and I had to put in
-
- #ifndef __STD_C__
- #define void int
- #endif /* __STD_C__ */
-
- before the uses of this. Ugh. I couldn't port "file" before for a
- similar reason (and wound up just copying over the ultrix binary).
-
-
- 195.
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- To: bugs
- Subject: flock broken
-
- flock() doesn't seem to work on sun3.new. It returns with an invalid
- argument. I don't know what the behavior is on sun3.
-
-
- 196.
- Date: Wed, 26 Jul 89 18:40:21 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: directories getting locked
-
- Sometimes when I do a pmake that migrates, it hangs. I can't kill it.
- Then if I try to do an ls in the same directory, it hangs too and I can't
- kill it. The directory becomes totally unavailable. This is inconvenient.
-
-
- 197.
- Subject: bug with ds3100 ar
- Date: Wed, 26 Jul 89 22:20:05 PDT
- From: Fred Douglis <douglis>
-
- I hit the following:
-
- ar r ../ds3100.md/libc.a ds3100.md/MigAsciiToInternal.o ds3100.md/MigGetLoca
- lName.o ds3100.md/MigInternalToAscii.o ds3100.md/Mig_ConfirmIdle.o ds3100.md/Mig
- _Done.o ds3100.md/Mig_GetAllInfo.o ds3100.md/Mig_GetIdleNode.o ds3100.md/Mig_Get
- Info.o ds3100.md/Mig_OpenInfo.o ds3100.md/Mig_UpdateInfo.o
- ar: Info: filename MigAsciiToInternal.o truncated to MigAsciiToInter
- ...
- ar: Warning:ignoring second definition of MigAsciiToInternal defined in arch
- ive
- ...
-
- indeed, there are two copies with the same name in there.
-
-
- 198.
- Subject: bug: ipserver dying hangs console; migrating prefixes
- Date: Thu, 27 Jul 89 17:46:43 PDT
- From: Fred Douglis <douglis>
-
- this has probably been reported before; maybe we can boost its
- priority. when mint's ipserver died this afternoon, we were unable to
- login at the console to kill it and start a new one. we could not
- migrate to mint because mint was running an old version of migration.
- finally, jhh suggested that i rlogin to tonkawa and migrate from
- there. (this worked, but only after i cd'd to /sprite, since "/" on
- tonkawa is different from "/" on mint, and mint tried to load its
- prefix table by broadcasting for "/" when it didn't already have a
- handle for "/" on tonkawa.)
-
- anyway, i was able to kill the ipserver once i could find it, and ken
- restarted mint's servers.
-
-
- 199.
- From: rab (Robert A. Bruce)
- Subject: problems with /user1
- Date: Thu, 27 Jul 89 15:45:35 PDT
-
- Martha reported the following problem with /user1:
- > My sprite account (/user1/zimet) appears to be hosed...
- > I have been having problems all day with rsh, rcp, etc.
- > into my directory on sprite. Is this usual?
-
-
- 200.
- Date: Fri, 28 Jul 89 09:17:17 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8907281617.AA528685@sprite.Berkeley.EDU>
- To: bugs
- Subject: /user1 unreadable from cory
-
-
- The problem is that the correct netroute command is not being run on allspice.
- It should run a "netroute -s" before installing /etc/spritehosts into the
- kernel. I have no idea which of the several bootcmds is getting run.
- The copies are in
- /boot/bootcmds
- /hosts/allspice/bootcmds
- /allspiceA/hosts/allspice/bootcmds
-
- which one should be modified?
-
-
- 201.
- From: rab (Robert A. Bruce)
- Subject: allspice out of memory
- Date: Sat, 29 Jul 89 03:59:58 PDT
-
- Allspice ran out of memory while /user1
- was being dumped.
-
-
- 202.
- From: rab (Robert A. Bruce)
- Subject: trashed file
- Date: Tue, 01 Aug 89 02:03:32 PDT
-
- /sprite/src/lib/c/ctype/isdigit.c was trashed. I moved the file
- into isdigit.c.trash and restored the RCS'ed version. This is the
- garbage that was in the file:
- --------------------------------------------------------------------------------
- isdigit(LIB
- $(LINTLIB) : $(SRCS:M*.c) $(HDRS) MAKELINT
- d207 4
- a210 3
- library : $(REGLIB)
- profile : $(PROFLIB)
- lint : $(LINTLIB)
- d212 5
- a216 4
- --------------------------------------------------------------------------------
-
-
- 203.
- Date: Fri, 28 Jul 89 09:21:46 PDT
- From: mendel (Mendel Rosenblum)
- Subject: fscheck bug
-
- Fscheck on allspice running on partition /user1 produced 504 messages of
- the form:
-
- Block count corrected for file 73341. Is 8 should be 6.
- ...
- Block count corrected for file 73366. Is 8 should be 5.
-
- And 28 messages of the form:
-
- File zimet/X11R3/mit-dist/X11/bitmaps/right_ptrmsk references non-allocated desc
- riptor 12987. File Deleted.
- ...
- File zimet/X11R3/mit-dist/X11/bitmaps/sipb references non-allocated descriptor 1
- 2990. File Deleted.
-
- Is somethink broken here?
-
-
- 204.
- Subject: bug: random address fault after recovery
- Date: Fri, 28 Jul 89 09:34:15 PDT
- From: Fred Douglis <douglis>
-
- I found that a window of mine had gone away, though I saw no msg in
- my syslog to account for it (such as a page fault problem). However,
- when I tried to restart the program (emacs), it hit a bus error immediately.
- When I killed the debuggable process and tried again, it worked okay.
-
- I have no idea how to repeat this bug, but I thought it would be worth
- reporting in case it becomes more common (big game).
-
-
- 205.
- Subject: I want to debug hanging migrations
- Date: Fri, 28 Jul 89 10:41:57 PDT
- From: Fred Douglis <douglis>
-
- People have become fairly complacent about problems with the system,
- killing processes and/or rebooting when things break rather than
- taking the time for someone to investigate the problem in detail.
- This makes it harder to identify the problems when they arise. At
- this point, there's one bug in particular that I'd like to ask people
- to tell me about immediately: if a pmake hangs part-way through, I
- want to debug the two machines involved and figure out what's going
- on. If I'm on the system, please come to me rather than killing the
- pmake.
-
- (Spriters: this is related to the bug Mary saw w.r.t. file locks. I
- didn't see the simple explanation I hoped to see, so I need to look
- into this the next time it comes up instead.)
-
-
- 206.
- Date: Fri, 28 Jul 89 14:19:36 PDT
- From: ouster (John Ousterhout)
- Subject: DS3100 bug: not enough processes?
-
- While beating on Pride to flush out the ipServer bug I created
- lots of processes. At one point the kernel entered the debugger
- with the message "Mach_SetupNewState: Out of machine state structs".
- Sounds like maybe the limit on # of processes and the number of
- states in Mach don't match.
-
- 207.
- Subject: ds3100 bug: WaitForSomething message
- Date: Fri, 28 Jul 89 15:10:50 PDT
- From: Fred Douglis <douglis>
-
- I keep getting "WaitForSomething(): select: errno=73" blasted on the
- console of the ds3100, despite having a window catting /dev/syslog.
-
- 208.
- Date: Fri, 28 Jul 89 15:16:04 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: tx extra selection stripe
-
- Sometimes in tx I get a black stripe to the right of the cursor that won't
- go away. It looks just like a selection, but it isn't the selection since
- it stays when I select something elsewhere. Clearing the window, etc, doesn't
- get rid of it. How do I make it go away?
-
-
- 209.
- Subject: bug: server deadlocks
- Date: Fri, 28 Jul 89 16:18:29 PDT
- From: Fred Douglis <douglis>
-
- the time /a filled up, i couldn't get out of a process on kvetching
- (swapping off of allspice) and i saw a message about a remove RPC to
- allspice being hung. It's bad enough when a remove on a full disk
- gets hung, but when a remove on an empty disk on another machine gets
- hung, something's pretty bad.
-
- some of us have also noticed that allspice has had a tendency to hang
- or crash when mint or oregano dies. any suggestions about what might
- be causing this interdependency would certainly be appreciated!
-
-
- 210.
- Subject: bug: can't backtrace user stack in kgdb
- Date: Fri, 28 Jul 89 16:43:07 PDT
- From: Fred Douglis <douglis>
-
- I am trying to find out why a migrated process is in the WAIT state,
- but when I do "where" from kgdb it just returns without printing
- anything, and "i r" prints 0 for all the registers. Seems like the
- debugging interface is screwed up. This is the Jul24 installed
- kernel.
-
- 211.
- Date: Fri, 28 Jul 89 19:51:21 PDT
- From: gibson (Garth Gibson)
- Message-Id: <8907290251.AA722218@sprite.Berkeley.EDU>
- To: bugs
- Subject: ds3100
-
- I've tried to port my simulation code to the 3100s (kvetching). After
- Fred fixed one problem I ran into this:
-
- It appears to go through initialization, including a printf, then it
- hangs. When I run it under dbx and arbitrarily ^C, I get:
- Interrupt [scalb, :0x408474]
- swc1 f20,20(sp)
- (dbx) where
- > 0 scalb(x = 1.0, N = 54) [0x408474]
- 1 scalb(x = 1.0, N = 54) ["ds3100.md/support.c":98, 0x40853c]
- 2 scalb(x = 1.0, N = 54) ["ds3100.md/support.c":98, 0x40853c]
- 3 scalb(x = 1.0, N = 54) ["ds3100.md/support.c":98, 0x40853c]
- 4 scalb(x = 1.0, N = 54) ["ds3100.md/support.c":98, 0x40853c]
- 5 scalb(x = 1.0, N = 54) ["ds3100.md/support.c":98, 0x40853c]
- and at least 400 more lines identical to the last 5.
-
- When I stopped at a particular address and "next"ed forward I get:
- [2] stopped at [.block2:612 ,0x401144] if( st_time_til_loss.cnt>=iters ) {
- (dbx) next
- [.block3:638 ,0x4014ec] for( i=0; i<num_disks; i++ ) {
- (dbx) next
- [.block3:639 ,0x401508] disks[i].failed = FALSE;
- (dbx) next
- [.block3:640 ,0x40152c] if( init_fail_rate != 0 ) { /* use Brady lifetim
- e distr */
- (dbx) next
-
- Illegal instruction [.block3:640 +0x1c,0x401548]
- if( init_fail_rate != 0 ) { /* use Brady lifetime distr */
- (dbx) where
- > 0 .block3 ["reli.c":640, 0x401548]
- 1 .block2 ["reli.c":640, 0x401548]
- 2 main(argc = 1, argv = 0x7fdffd0c) ["reli.c":640, 0x401548]
-
- and Fred tells me that kvetching's console got a message about
- "invalid breakpoint".
-
- I'm declaring failure for awhile, so I'll copy my code (~gibson/RELI/reli.c)
- to (~gibson/RELI/reli.c.bug) and leave the executable (same/ds3100.md/RELI).
-
-
- 212.
- Date: Sat, 29 Jul 89 09:53:11 PDT
- From: mendel (Mendel Rosenblum)
- Subject: allspice recovery damages processes on murder
-
- Just after allspice recovered last night a Xsprite, tx, and cat /dev/syslog
- I had running on murder entered the debugger with a segmentation fault.
-
-
- 213.
- Date: Sat, 29 Jul 89 15:12:52 PDT
- From: gibson (Garth Gibson)
- Message-Id: <8907292212.AA66852@sprite.Berkeley.EDU>
- To: bugs
- Subject: reseting tx
-
- when i break out of top in an odd way
- (in this case, I killed a process from within top
- and somehow this terminated the top)
- none of my keystrokes are echoed
-
- when this happened on BSD i did a "reset" but
- reset in tx says "Type tx unknown"
-
- using the menu entry "clear and reset window" also fails
- to turn keystroke echoing back on
-
-
- 214.
- Date: Sun, 30 Jul 89 15:45:16 PDT
- From: gibson (Garth Gibson)
- Subject: nfsmount core leak ?
-
- Basil is currently experiencing substantial paging whenever I do anything
- (ie., in particular copy from nfs to nfs causes > 15 page faults per
- second and the little copy (24KB) takes more than 10 seconds). Basil
- is the server for the nfsmount of /spur. It is only an 8MB machine
- and although I do have 12 windows (10 tx) and 5 rsh's running,
- but the problem appears to be nfsmount - it is at 4.2 MB. When I do things
- that involve local execution, nfsmount is paged out; when I do things
- across nfs, about 2MB are paged in.
-
- I killed nfsmount and restarted it and its memory usage was only 184 KB.
- I did a giant ls -R across nfs and it grew to 312 KB but seemed to stay
- there.
-
- Mendel speculated that this might be a core leak in nfsmount. Does
- anyone want to run nfsmount for /spur on their machine?
-
-
- 215.
- Date: Mon, 31 Jul 89 14:23:45 PDT
- From: deboor (Adam R de Boor)
- Subject: vi segv
-
- I logged in to thyme from envy, so my rows and columns were 0,0. When I did
- an stty rows 61 (forgetting that columns would be 0) and foregrounded a vi,
- it complained about screen too large for internal buffer, then died with
- a segv. It's on the debug queue on thyme (pid e1a49) if anyone wants
- to look at it. If not, could someone kill it for me :)
-
-
- 216.
- Subject: bug: ds3100 exec.h/a.out.h inconsistency
- Date: Tue, 01 Aug 89 13:51:11 PDT
- From: Fred Douglis <douglis>
-
- Programs that use a.out.h won't compile for the ds because N_TXTOFF is
- called with one param in a.out.h but defined to take two params in
- sys/exec.h.
-
- 217.
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- To: bugs
- Subject: pmake all does ds3100
-
- I did a "pmake all" on a sun3 and it compiled a completely worthless
- ds3100 version of the program.
-
-
- 218.
- Subject: warning to people trying to debug on the ds3100
- Date: Wed, 02 Aug 89 17:57:37 PDT
- From: Fred Douglis <douglis>
-
- Mike said something about adding support for debugging, but for the
- time being, it's often hard to impossible to get a backtrace of a
- process, depending on how it stops. I removed the mousetrap I had put
- in loadavg, because calling abort() wouldn't let me look at anything
- interesting. I also find that emacs locks up on me after I start a
- sub-process, maybe one time in 20 or 30, and the backtrace after a
- kill -DEBUG was only one call deep and was probably wrong to boot.
-
-
- 219.
- Subject: is time going backward?...
- Date: Wed, 02 Aug 89 19:54:34 PDT
- From: Fred Douglis <douglis>
-
- ... or are user variables getting trashed?
-
- finger uses a kernel "idle time" variable that was causing it to get
- confused. It turned out that kvetching's idle time was -4 seconds.
- Since this is calculated by doing a Timer_GetTimeOfDay and then doing
- another Timer_GetTimeOfDay and subtracting the first from the second,
- a difference of -4 means either that the clock is getting messed up or
- the loadavg daemon's variables are. given the NaN I've seen, perhaps
- it's the second, in which case this bug report is nothing new, but I
- figured it could also be related to the time-flowing-backward bug that
- Ed reported a while ago.
-
- all in all, kvetching's clock seems fairly accurate ("date" coincides
- pretty well with reality).
-
-
- 220.
- Date: Thu, 3 Aug 89 12:07:42 PDT
- From: douglis@sprite.Berkeley.EDU (Fred Douglis)
- Subject: bug: header updating must change date
-
- If a header file is installed using update, it's possible for object files
- not to get recompiled because they've been compiled since the date when the
- header was written, even if they haven't been compiled since the header
- was installed. This could account for why the debugger still can't
- backtrace user processes on sun3s, since kgdb sees the wrong version of
- Mach_UserState.
-
-
-
- 221.
- Subject: bug: ds3100 clock
- Date: Thu, 03 Aug 89 17:31:21 PDT
- From: Fred Douglis <douglis>
-
- it was 5 minutes slow when I checked just now. confirmation that time
- may occasionally be flowing backwards, given the -4 seconds idle time
- i saw yesterday.
-
-
- 222.
- Subject: bug: tx caret disappearing
- Date: Thu, 03 Aug 89 22:32:12 PDT
- From: Fred Douglis <douglis>
-
- On paprika, when a tx window fills and starts scrolling, the input
- caret is barely visible at the bottom of the window. on kvetching,
- the caret disappears entirely, and i must scroll the window up so some
- blank space appears on the bottom in order to get a caret to appear.
-
- this occurs even if i open a window from paprika on kvetching, so it's
- not the ds3100 tx client (the same sun3 binary produces different results
- on the two different displays).
-
-
- 223.
- Date: Fri, 4 Aug 89 08:51:30 PDT
- From: ouster (John Ousterhout)
- Subject: Bug: /sprite/users directory weird
-
- Something is wrong with /sprite/users, or with du, or with ls.
- If I cd to /sprite/users and type "du", a bunch of lines appear
- for a subdirectory "cmds.ancient". Yet if I type "ls" in
- /sprite/users, no such directory appears, and I cannot cd to
- /sprite/users/cmds.ancient. This paradox appears to be
- repeatable, at least for me on Mace.
-
-
- 224.
- Subject: bug: inflated loadavgs
- Date: Fri, 04 Aug 89 11:17:05 PDT
- From: Fred Douglis <douglis>
-
- At least three hosts right now are listed as having load averages of
- over 1.0 although there are apparently no processes using up vast
- amounts of CPU time. I went to murder and l1-r repeatedly and there
- were never any ready processes. each host is running a different
- kernel, so it's not like a bug was just introduced. i suspect that
- the "numReadyProcesses" variable is getting confused but have been
- unable so far to find out how. If anyone knows of a repeatable case
- to get machines into this state please let me know.
-
-
- 225.
- Date: Mon, 7 Aug 89 11:38:47 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: cc include path defaults to sun3.md
-
- If you compile something for the sun4, without explicitly putting the
- -I/sprite/lib/include/sun4.md back into its include path in a .mk file,
- it will pick up header files from /sprite/lib/include/sun3.md. I don't think
- this is a good idea, since it silently includes the wrong stuff in many cases.
- Either none of the machine types should have a default include path, or they
- all should have ones that work.
-
-
- 226.
- Subject: Mail installed on ds3100
- Date: Mon, 07 Aug 89 11:36:27 PDT
- From: Fred Douglis <douglis>
-
- I figured out that Mail wouldn't link because it has some arrays of
- structures that the dec compiler/loader can't handle. this is
- really their bug rather than ours, and I am inclined to patch around
- it temporarily and wait for gcc rather than trying to fix the bug
- (since we don't have sources anyway). Maybe if Mike wants to pass the
- problem on to people at DEC, that would be useful?
-
- Anyway, the fix was to add "-G 0" to the cc flags so Mail is compiled
- without using what they call the "global pointer".
-
-
- 227.
- Subject: ds3100 bug: FPU interrupt in kernel mode
- Date: Mon, 07 Aug 89 12:56:51 PDT
- From: Fred Douglis <douglis>
-
- kvetching died with this just now. kdbx just kept printing a
- backtrace of an infinite number of MachFPInterrupt calls. Any
- suggestions of something to look at next time this happens?
-
-
- 228.
- Date: Mon, 7 Aug 89 13:59:01 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: no documentation on malloc tracing
-
- Is there no man page describing how to turn on memory tracing of different
- sorts? You can read the code and piece it together by trial and error, but
- it sure would be nicer just to read a man page.
-
-
- 229.
- Date: Mon, 7 Aug 89 22:33:39 PDT
- From: david@rosemary.Berkeley.EDU (David A. Wood)
- Subject: /tmp on mace and murder
-
- There seems to be a problem with /c on both mace and murder.
- Since both systems have /tmp linked to /c/tmp, many programs
- (including mail) don't work.
-
-
- 230.
- Date: Tue, 8 Aug 89 09:36:15 PDT
- From: ouster (John Ousterhout)
- Subject: Piracy in debugger again
-
- Piracy has entered the debugger again with the message
-
- Bad kernel TLB Fault
- Entering debugger with a TLB LD miss exeception at PC 0x0
-
-
-
- 231.
- Date: Wed, 9 Aug 89 12:04:51 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: migration/rpc problem?
-
- I get messages of the following type when I do a pmake:
- Warning: Proc_RpcRemoteCall: invalid pid: f1a67.
- The pmake then hangs. I do have a process f1a67:
- f1a67 MIG 2802 ffffffff sage c2121 sh -ev
- Any idea what the problem is?
-
-
- 232.
- ubject: bug: repeated recovery
- Date: Tue, 08 Aug 89 15:50:04 PDT
- From: Fred Douglis <douglis>
-
- kvetching went into an infinite loop recovering w/ mint.
- mint's syslog said:
-
- 8/8/89 15:47:12 kvetching (2) starting recovery
- 8/8/89 15:47:15 kvetching (2) completed recovery
- Fs_RpcIOControl: Stream/handle mis-match
- Stream <32, 32, 165> => File I/O <32, 0, 1881>
-
- kvetching said file 1881 had a stale handle, and then tried again.
-
-
- 233.
- Subject: ds3100 bug: another recovery problem
- Date: Wed, 09 Aug 89 15:50:05 PDT
- From: Fred Douglis <douglis>
-
- when oregano rebooted, kvetching started printing "(" over and over on
- its console. One process claimed to be in the running state, and lots
- of others were ready. An RPC to kill the running process got hung
- since the rpc daemon couldn't run. I rebooted out of frustration,
- though I suppose I should have poked around first.
-
-
- 234.
- Subject: ds3100 bug: XIO reset
- Date: Wed, 09 Aug 89 18:10:32 PDT
- From: Fred Douglis <douglis>
-
- I occasionally have X windows just disappear. Usually they're my
- xbiff window or the tx that cats /dev/syslog. I get "XIO: Connection
- reset by peer" when this happens. Any ideas?
-
-
-
- 235.
- Date: Thu, 10 Aug 89 11:36:41 PDT
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: bug: wall and rlogins
-
- wall was never fixed to notify remote users. we have a reasonable number
- of such users, especially Martha, who would appreciate such notification when
- the world is about to end.
-
- also, wall doesn't talk to the cory hosts because /hosts/tonkawa, et al.,
- aren't the real directories.
-
-
-
-
- 236.
- Date: Thu, 10 Aug 89 17:06:27 PDT
- From: shirriff (Ken Shirriff)
- Subject: rpn is broken
-
- I recompiled rpn and now the octal and hex functions don't work. The
- problem seems to be due to varargs dropping parameters. Can someone
- who understands varargs better than I do take a look? The problem is
- in src/main.c around line 147, where it calls dpyprintf. Then in
- dpyprintf in dpy/dpy.c, the arguments don't seem to be correct.
-
-
- 237.
- Date: Thu, 10 Aug 89 18:42:04 PDT
- From: eklee (Edward K. Lee)
- Subject: fscmd
-
- Sometime when I execute fscmd -f, I get a message saying "1 locked blocks left".
- What does this mean?
- The number of locked blocks seem to accumulate over time.
-
-
- 238.
- Date: Fri, 11 Aug 89 09:50:34 PDT
- From: ouster (John Ousterhout)
- Subject: Stale handle warnings
-
- I've gotten 3 stale handle warnings this morning:
-
- 8/11/89 8:49:13 oregano (38) RmtFile "/tmp//Mx.Re334.1" <3,55891> Write-back fai
- led: stale handle
- 8/11/89 9:44:36 mint (32) RmtFile "tfAA858935" <1,62649> Write-back failed: stal
- e handle
- 8/11/89 9:44:41 mint (32) RmtFile "/sprite/spool/mail/douglis" <1,1010> Write-ba
- ck failed: stale handle
-
- I've also gotten 4 "oregano (38) completed recovery" messages this
- morning, even though neither mace nor oregano has crashed.
-
-
- 239.
- Date: Fri, 11 Aug 89 10:02:51 PDT
- From: ouster (John Ousterhout)
- Message-Id: <8908111702.AA596784@sprite.Berkeley.EDU>
- To: bugs
- Subject: Bug: finger timing out on pepper
-
- Whenever I run "finger" right now, the following messages appear
- in my syslog window:
-
- <getIOAttr> 8/11/89 9:59:20 pepper (16) RPC timed-out
- FsRemoteGetIOAttr failed <30002>: device <0,3343505> at server 16
-
-
- 240.
- Subject: stale handles
- Date: Fri, 11 Aug 89 10:15:54 PDT
- From: Fred Douglis <douglis>
-
- perhaps this is confirmation that the "stale handle" warnings and
- trashed files are related. John reported "write-back failed" on my
- spool file, and twice this morning my mail file has been corrupted
- (nulls in-between two messages).
-
-
- 241.
- Date: Fri, 11 Aug 89 11:17:47 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: ds3100 mail problems
-
- If I type 'ctrl-C' while composing a mail message I get the standard
- "Interrrupt -- one more to kill letter" message. The second "ctrl-C"
- doesn't do anything. I have to type "ctrl-Z" and kill the job.
-
-
- 242.
- Subject: bug: null length symlinks
- Date: Fri, 11 Aug 89 13:28:01 PDT
- From: Fred Douglis <douglis>
-
- When /c filled up before, the symbolic links I created wound up as 0
- length (pointing to nothing). Directories weren't created due to
- lack of disk space -- couldn't the same logic be applied to symbolic
- links, rather than creating 0-length links? (I don't think the
- problem affects files, since once space was freed up the files were
- apparently written okay.)
-
-
- 243.
- Date: Fri, 11 Aug 89 13:44:46 PDT
- From: brent (Brent Welch)
- Subject: zero length symbolic links
-
- Indeed, the current implementation of symbolic links
- has a number of problems, including the feature
- of creating zero-length symbolic links when the
- disk is full. That problem can be fixed in
- Fs_SymLink by removing the link if the Fs_Write fails.
- However, a more fundamental problem is that the
- creation of symbolic links should be a "domain dependent"
- operation instead of being composed of the open, write, and
- close "domain dependent" operations. (The problem with
- disk full still has to be addressed with this arrangment.)
- If we make this change then we'll be able to create
- symbolic links in NFS domains correctly. (Interestingly,
- while the NFS protocol has a SYMLINK RPC, it also allows
- you to create a file of type symbolic link and write
- a value to it. It's too bad that this works because it
- means that we can create sprite-like symbolic links in
- NFS domains. The difference is in the presense (in sprite)
- of a trailing null.)
- brent
- ps. The file servers already guard against zero-length links,
- so oregano just complained about them.
-
-
- 244.
- Date: Fri, 11 Aug 89 14:02:34 PDT
- From: shirriff (Ken Shirriff)
- Message-Id: <8908112102.AA918313@sprite.Berkeley.EDU>
- To: bugs
- Subject: Compiler bug
-
- On the sun3, if I cast a double to an unsigned int, I get 0. Casting a
- float to unsigned int or double to int works.
- (This is why rpn wasn't working.)
-
-
- 245.
- Date: Fri, 11 Aug 89 14:56:26 PDT
- From: brent (Brent Welch)
- Subject: System failures
-
- Mint and Oregano crashed and turned up (at least) three bugs.
-
- 1) pwd in a psuedo-file-system isn't fully correct. There is
- new code to return the prefix associated with an open file,
- and this crashed Oregano. The pwd was on sage, and the nfsmount
- was running on Oregano. I think the bug is that the shadow
- stream descriptor on Oregano (the shadow of the stream set up
- on sage) isn't setup the same as the real stream descriptor on
- sage, and the code should use the client's information instead
- of forwarding the operation to the server. If that isn't clear,
- then don't worry about it, I think I have a handle on it.
-
- 2) Mint got an open error on a file in /c/tmp because Oregano
- was down. It then erased its handle information for the
- /sprite prefix, oops. Needless to say, this prevented Oregano
- from completing its boot sequence, and required a restart of mint.
- I don't know, yet, why mint would do such a thing. It may have
- been confused by pathname redirection, /tmp => /sprite/tmp => /c/tmp.
- After getting the error on /c/tmp it wrongly erased information
- about /sprite instead of /c.
-
- 3) After Mint rebooted /tmp was gone. Apparently this has happened
- before. I suspect something in mints boot script.
-
-
- 246.
- Date: Fri, 11 Aug 89 16:51:56 PDT
- From: shirriff (Ken Shirriff)
- Subject: kgdb problem
-
- I ran into a problem debugging on allspice with kgdb.sun3. The debugger
- would crash with a segmentation violation when I tried to examine a
- particular structure.
-
- I tried to recompile kgdb.sun3 to help find the problem, but when I try
- to recompile kgdb.sun3/values.o, cc1.sparc dies and the cc hangs.
-
-
- 247.
- Date: Fri, 11 Aug 89 17:42:46 PDT
- From: mendel (Mendel Rosenblum)
- Subject: sun4 compiler problem
-
- When compiling the fstat program for the sun4, gcc generates references to the
- undefined symbol ___fixunsdfsi.
-
-
- 248.
- Date: Sat, 12 Aug 89 10:07:11 PDT
- From: ouster (John Ousterhout)
- Message-Id: <8908121707.AA793393@sprite.Berkeley.EDU>
- To: bugs
- Subject: Bug in finger idle times?
-
- I received the following output from finger at about 10:00 this morning:
- ...
- Notice that every rlogin-ed connection has an idle time of 3 minutes,
- even though none of the supposed users is actually here working.
- Furthermore, notice, for example, that Fred's idle time on Allspice
- is 3 minutes, yet his idle time on Kvetching, the source of the
- connection to Allspice, is many hours. I checked /hosts/allspice/rlogin*,
- and two of the files, rlogin1 and rlogin3, really do have last-access
- times of 9:56 this morning.
-
- I suspect that it is no coincidence that Oregano finished a reboot at
- exactly the claimed last-access time of all these rlogin connections.
- It appears to me that something related to recovery (device re-open?)
- is updating the access times when it shouldn't.
-
-
- 249.
- Subject: sun4 bug: rlogind hung
- Date: Sun, 13 Aug 89 18:37:14 PDT
- From: Fred Douglis <douglis>
-
- I hit ^C and started typing. I then saw "Fs_Dispatch: stream ID 257
- out of range" and my rlogin to allspice hung.
-
-
- 250.
- Date: Sat, 12 Aug 89 11:06:21 PDT
- From: mendel (Mendel Rosenblum)
- Subject: mouse problems on sun4
-
- If you move the mouse on anise while doing a compile you get the message
- "Warning: receiver overrun on mouse" printed in the syslog and the system
- acts like you pushed down a mouse button. Many times this causes a uwm menu
- to appear and then disappear.
-
-
- 251.
- Date: Sat, 12 Aug 89 11:49:33 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8908121849.AA995596@sprite.Berkeley.EDU>
- To: bugs
- Subject: malloc on sun4 doesn't align memory correctly
-
- Malloc on the sun4 returns objects only aligned to a four byte boundary.
- This means that mallocing double floating point variables will fail. For
- example:
-
- struct foo {
- /* other stuff */
- double max;
- /* more other stuff */
- } *foo;
-
- main()
- {
- foo = malloc(sizeof(struct foo));
- foo->max = 0.0;
- }
-
- seg faults everytime on Sprite. The large memory allocator appears to
- align stuff correctly.
-
-
- 252.
- From: rab (Robert A. Bruce)
- Subject: allspice crashed
- Date: Mon, 14 Aug 89 01:08:53 PDT
-
- Allspice crashed while /user1 was being dumped.
-
- Pmeg lists empty
- Program received signal 16, Interrupt Trap
-
- #0 panic (__builtin_va_alist=-167186280) (sysPrintf.c line 188)
- #1 0xf608f128 in PMEGGet () (sun4.md/vmSun.c line 1329)
- #2 0xf6090e18 in VmMach_PageValidate () (sun4.md/vmSun.c line 3109)
- #3 0xf6087678 in VmPageValidateInt () (vmPage.c line 644)
- #4 0xf6088990 in PreparePage () (vmPage.c line 1657)
- #5 0xf608848c in Vm_PageIn () (vmPage.c line 1470)
- #6 0xf600fa80 in testModuloLabel ()
- ERROR: invalid read address 0xcac4
-
- 253.
- Date: Mon, 14 Aug 89 11:59:48 PDT
- From: brent (Brent Welch)
- Subject: Allspice crashed, level 15 interrupt
-
- Allspice crashed again with a level 15 interrupt error.
- Mendel says that this means that the cache hit a
- protection error during a write-back. This is an
- asynchronous error so we couldn't really figure out
- the exact details of the problem. We were able
- to continue allspice, and rlogind ended up in the
- debugger because it had the affected page.
-
-
- 254.
- Date: Mon, 14 Aug 89 12:04:07 PDT
- From: brent (Brent Welch)
- Subject: mint erased "/sprite" again
-
- When allspice crashed mint erased its prefix table
- entry for "/sprite". I rebooted mint with a new
- kernel that supposedly guards against this, but
- it didn't help. I logged in as root and typed
- "cd sprite" and it immediately printed out
- "Broadcasting for server of /sprite", oops.
- This seems repeatable, although I bet that allspice
- (or oregano) has to be down at the time. By the way,
- the machines were down for only 1/2 hour today
- (11:18 to 11:50) during all of this. I'll wait until
- "after hours" to reenact the problem with mint and /sprite.
-
- 255.
- Date: Mon, 14 Aug 89 15:32:19 PDT
- From: shirriff (Ken Shirriff)
- Subject: undefined net routines
-
- There are a bunch of routines used in netCode.c and netRoute.c that
- aren't defined: Net_InetChecksum, Net_InetChecksum2, Net_InetAddrToString,
- and Net_EtherAddrToString. I can't compile a kernel because these
- aren't defined, so if anyone knows what the situation is with these,
- please let me know.
-
-
- 256.
- Subject: bug: lpd broken
- Date: Mon, 14 Aug 89 16:59:18 PDT
- From: Fred Douglis <douglis>
-
- i saw an error message the last time i booted paprika, and thyme now
- has 3 lpds in the debugger. someone install a broken version
- recently?
-
-
- 257.
- Date: Tue, 15 Aug 89 11:31:16 PDT
- From: shirriff (Ken Shirriff)
- Subject: Evil black blob in tx
-
- To repeatably create the indestructible black bar in tx that someone
- reported earlier, click control-left button twice on an opening
- parenthesis and then clear the window.
-
-
- 258.
- Date: Tue, 15 Aug 89 11:48:04 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: mx problems
-
- My mx window died with the following error:
-
- hijack<jhh 291> X Error: bad request code
- Request Major code 162
- Request Minor code
- ResourceID 0xe0012
- Error Serial #1349
- Current Serial #1488
-
- I don't know what I was doing at the time -- I think I was trying to
- scroll up with a lot of stuff selected on the current screen.
-
- Also, sometimes when mx starts up on a ds3100 I get just the frame of
- the window with no contents. It doesn't get filled in for at least 10
- seconds, although if I click the mouse in the window it gets filled in
- immediately.
-
- 259.
- Subject: kgdb bug
- Date: Tue, 15 Aug 89 12:33:29 PDT
- From: Fred Douglis <douglis>
-
- after reading a new symbol table i was not able to call functions. i
- had to exit and restart gdb instead.
-
-
- 260.
- Date: Tue, 15 Aug 89 12:47:48 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: compat error message
-
- All of a sudden, commands in two windows hung. One was an ls and the other
- was a "msgs". After about a minute, they both finally said
-
- "compat: Cannot decode user status value 0xffffffff"
-
- The ls finished, but the messages kept printing it over and over, slowly.
-
- My syslog window repeatedly said:
-
- RpcDoCall: <open> RPC to oregano is hung
- <open> RPC exit 0xffffffff
-
-
- 261.
- Date: Tue, 15 Aug 89 14:51:03 PDT
- From: brent (Brent Welch)
- Subject: Mint crash Aug 15
-
- Mint crashed after recieveing the wrong reply messasge from Oregano.
- It hit a bug error in FsSpriteOpen, the client-side RPC stub.
- The return packet seemed garbled, and in fact it turned out to
- be the reply packet for a stat RPC, not an open. Oregano was
- being sluggish in responding to RPCs (a sign of a network interface
- that needs to be reset), and when mint retransmitted a request
- Oregano responded with the incorrect reply. Oregano seemed
- to resend a stat reply with the message ID and command field
- associated with an open RPC. The bogus reply was followed
- immediately by the transmission of the good open reply.
- This means that the scatter gather mechanism in the interface
- took the RPC header from one packet and the parameter block
- from another (just a theory).
- The trace went something like:
-
- Open request retransmitted by mint (flags == Qp)
- Open reply with parameter block from a stat
- Open reply with good parameter block
-
- >From kgdb you can dump the RPC trace with
- (kdbg) print Rpc_PrintTrace(50)
-
- >From the console keyboard you can reset a Sun3's network
- interface with L1-n. Before this problem I noticed
- several complaints from the nfsmount processes on Oregano
- about RPC timeouts to the NFS server. Anyway, there
- are a number of possible things to do, beginning with nothing.
- Beyond that, sanity checks can be added to all RPC stubs,
- which is probably a good idea, although it will add overhead.
- Finally, we could periodically reset the Intel ethernet
- interfaces, which apparently have a reputation for being
- flakey. Currently the RPC system will do the reset when
- it recieves apparently garbled packets, but that didn't
- kick in this case.
- brent
- ps. This isn't the first time Oregano's ethernet interface
- has acted up and returned bogus packets to clients.
-
-
- 262.
- Date: Tue, 15 Aug 89 15:30:22 PDT
- From: shirriff (Ken Shirriff)
- Subject: more on ds3100
-
- If I do "more" on a file, then do a search with "/" for something that
- isn't in the file, I get "Pattern not found" and then "Segmentation
- violation".
-
-
- 263.
- Subject: more unrepeatable ds3100 errors
- Date: Tue, 15 Aug 89 15:45:36 PDT
- From: Fred Douglis <douglis>
-
- for the record:
-
- cd /a/attcmds/more; pmake newtm
-
- resulted in the complaint "userMap: undefined variable" and no
- md.mk file being created. A second mkmf worked fine.
-
- cd kernel/fs; rm ds3100.md/*.o; pmake
-
- done this morning to 'show off' to bks. all i did was show
- off how sprite is flaky, because one of the compilations
- returned with exit status 1 even though no error messages were
- produced and a second make worked just fine.
-
-
- 264.
- Date: Tue, 15 Aug 89 19:31:29 PDT
- From: mendel (Mendel Rosenblum)
- Subject: bug in sun3/sun4 timer code.
-
- The Sprite timer code on the sun3 and sun4 doesn't handle the case of the
- chip running backwards. This causes the gettimeofday() on the sun3 and sun4
- to sometime run backwards. The chip seems do two things wrong.
-
- 1) The hundredths registers sometimes reads out values greater than 99. I
- have seen values as great as 127 come out. This causes the time
- returned to be unnormalized because it has 1,000,000 microseconds. This
- seems pretty easy to dectect and fix.
-
- 2) Other times the hundredths appears to jump forward and settle back again.
- I've seen the hundredths register go (31, 62, 32, 33) on successive
- reads. This seems harder to dectect and fix. It appears that one
- can not trust the timer chip to keep track on time of day on a find
- grain.
-
- Any suggestions on how to get around this problem? The easiest fix I
- can think of is to just prohibit time from ever go backwards.
-
-
- 265.
- From: rab (Robert A. Bruce)
- Subject: trashed file
- Date: Tue, 15 Aug 89 20:11:56 PDT
-
- /sprite/src/daemons/ipServer/RCS/stat.h,v is trashed. I moved
- it to /sprite/trashed. I will try and restore the file from
- a dump tape.
-
-
-
- 266.
- Date: Wed, 16 Aug 89 17:24:19 PDT
- From: shirriff (Ken Shirriff)
- Subject: ds3100 man problem
-
- Running "man command" where command.man is a new man page that hasn't been
- nroffed yet yields "sh: nroff: not found".
-
-
- 267.
- Date: Wed, 16 Aug 89 17:43:38 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: mx selection problem
-
- Here is the scenerio.
-
- I rlogin to a sun3 from a ds3100.
- I run mx on the sun3 such that it is displayed on my ds3100.
- I select something not in the mx window.
- I try to paste it into the mx window.
-
- The problem:
-
- After a long pause I get:
- Tried to use selection, but nothing's selected.
-
- I can now have two selections on my screen -- on in the mx window and one
- in any other window. None of the windows will recognize the "enemy"
- selection.
-
-
- 268.
- Date: Wed, 16 Aug 89 17:57:36 PDT
- From: shirriff (Ken Shirriff)
- Subject: ds3100 dbx dies
-
- dbx bombs out on me and leaves me with
- dbx: internal error: pwait: pid 591408 not found
- after I set a bunch of breakpoints.
- The sequence of events is in ~shirriff/dbxbug.
-
-
- 269.
- From: rab (Robert A. Bruce)
- Subject: ipServer on allspice
- Date: Wed, 16 Aug 89 23:09:56 PDT
-
- Allspice's ipServer crashed. I tried to debug it, but it
- died before I could get a stack trace. There was a suspicious
- message on the console:
-
- Intel: spurious interrupt (2)
-
- but I don't know if it is related.
- I put a copy of `restartservers' in /hosts/allspice.
-
-
- 270.
- From: rab (Robert A. Bruce)
- Message-Id: <8908170626.AA855356@sprite.Berkeley.EDU>
- To: bugs
- Cc: rab
- Subject: rlogind
- Date: Wed, 16 Aug 89 23:26:05 PDT
-
- I opened an allspice window, set the termcap and then typed `clear'.
- I got the following message:
-
- PdevServiceRequest, bad request magic # 0x31c1113
-
- The window froze up and rlogind was in the debugger. I opened another
- window, and tried the same thing. It didn't work, so I typed `exit'.
- The window froze and a second rlogind was in the debugger.
-
-
- 271.
- Subject: DEFTARGET bug
- Date: Wed, 16 Aug 89 23:29:46 PDT
- From: Fred Douglis <douglis>
-
- this has been brought up before: TM defaults to sun3 if not set
- explicitly. I have "TM=$MACHINE" in my PMAKE environment variable and
- it's worked well for me. John H. doesn't and he was not able to do
- mkmf using the modified tm.mk because TM was set explicitly to sun3
- even though the target was really "dependall" and TM didn't matter.
-
- I'd like to change all references of the form
-
- TM ?= @(DEFTARGET)
-
- to
- TM ?= $(MACHINE)
-
- any problems with this?
-
-
- 272.
- Date: Thu, 17 Aug 89 00:22:15 PDT
- From: shirriff (Ken Shirriff)
- Message-Id: <8908170722.AA984622@sprite.Berkeley.EDU>
- To: bugs
- Subject: ds3100 nroff bug
-
- The problem with nroff occurs in the environment saving function caseev.
- A bunch of variables are defined in ni.c:
- int block = 0;
- int ics;
- int icf; ... etc ...
- int *hyptr[NHYP] = {0}; ... etc ...
-
- Then caseev does read(.., (char *)&block, LENGTH_OF_EVERYTHING), which
- is supposed to read in all these variables in one fell swoop. However,
- this assumes the variables are stored consecutively, which they are
- on the sun. However, on the ds3100, the initialized arrays are put
- before everything else, so the reads and writes are modifying the
- wrong variables.
-
-
- 273.
- Date: Thu, 17 Aug 89 08:30:54 PDT
- From: ouster (John Ousterhout)
- Subject: Re: rlogind
-
- The rlogind bug Bob reported sounds just like a bug Mike found in
- the ipServer, where the kernel was reporting more data in the pdev
- request buffer than was really there, causing the server process
- to try to handle an extra request. The ipServer also died with a
- bad magic number. Since Brent was away on vacation, Mike just
- patched the ipServer to ignore bad requests. I think that the
- problem is pretty reproducible on ds3100's: just take the patch
- out of ipServer and try to run X.
-
-
- 274.
- Date: Thu, 17 Aug 89 09:12:58 PDT
- From: ouster (John Ousterhout)
- Subject: Bad News on Dinner
-
- It appears that Mint was inaccessible through the network all yesterday
- afternoon and night. Martha Zimet came by late yesterday afternoon
- to say she hadn't been able to login to Mint all afternoon. I was able
- to rlogin from mace, so I didn't look any further. However, this morning
- she was still unable to login. I went upstairs and restarted all Mint's
- daemons, which fixed the problem. Portmap had been in the debugger. In
- my haste to get things going for Martha I just killed it. In retrospect
- I should have taken a look with the debugger.... sorry about that. What
- is portmap, anyway? Mint was refusing rlogin's and rsh's, but honoring
- pings and rcp's.
-
- There were no network daemons running on Allspice this morning either.
- I restarted them.
-
- By the way, mail apparently wasn't getting through yesterday either:
- once I restarted the daemons, a flood of day-old internet mail arrived for
- me.
-
-
- 275.
- Date: Thu, 17 Aug 89 09:39:40 PDT
- From: ouster (John Ousterhout)
- Subject: /tmp disappeared again
-
- After Oregano's crash and reboot this morning, /tmp was gone again.
- I added back the symbolic link to /c/tmp. I'm beginning to suspect
- that Oregano's boot scripts are responsible for this.
-
-
- 276.
- Date: Thu, 17 Aug 89 09:44:27 PDT
- From: mendel (Mendel Rosenblum)
- Subject: someone broken mkmf on ds3100
-
- When I try to mkmf a directory on a ds3100 I get the message
- "/sprite/lib/pmake/tm.mk", line 91: Undefined variable "$("
- Fatal errors encountered -- cannot continue
-
- Sure enought, line 91 on of tm.mk is
-
- syntax_error: $(
-
- I have commented this line out so I can do mkmf.
-
-
- 277.
- Date: Thu, 17 Aug 89 10:01:19 PDT
- From: mendel (Mendel Rosenblum)
- Subject: oregano crash
-
- Oregano crasshed this morning with a bus error in FsWriteBackDesc(). It
- looked like FsDomainFetch() must of returned a bad domain pointer.
-
-
- 278.
- Date: Thu, 17 Aug 89 11:39:50 PDT
- From: ouster (John Ousterhout)
- Subject: Pmake lost characters
-
- I just did a "pmake install TM=sun3" in kernel/dev, and at the very
- end of the pmake the following output occurred:
-
- ...
- devTty.c:
- mv llib-ldev.ln sun3.md/llibrm -f sun3.md/llib-ldev.ln
- usage: mv [-if] file1 file2 or mv [-if] file/directory ... directory
- *** Error code 1
- pmake: 1 error
-
- I reran the pmake, and it then worked OK, producing the following output:
-
- ...
- devTty.c:
- mv llib-ldev.ln sun3.md/llib-ldev.ln
- --- ../Lint/sun3.md/dev.ln ---
- rm -f ../Lint/sun3.md/dev.ln
- /sprite/cmds.sun3/cp sun3.md/llib-ldev.ln ../Lint/sun3.md/dev.ln
-
- It looks like command lines from two different targets may have gotten
- scrambled together. As I remember, this is similar to the problems
- people have been having on the DS3100s, but this particular example
- was on a Sun-3.
-
-
- 279.
- Date: Thu, 17 Aug 89 11:40:56 PDT
- From: brent (Brent Welch)
- Subject: Re: rlogind
-
- There were several rlogind in the DEBUG state. Each one seemed
- to die in a different spot. gdb also died after looking around
- a little bit. I think rlogind memory image got trashed, and
- I suspect the cache-write back problem that allspice had
- a couple days ago. We continued allspice after a cache write
- back protection error, and rlogind ended up in the debugger at
- that time. Perhaps the cached page table for rlogind has a
- bogus value, so any rlogind will eventually die? There are
- probably ways to flush segments and check this, but I don't
- remember them.
-
-
- 280.
- Date: Thu, 17 Aug 89 12:15:11 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: merge, rcsmerge
-
- Neither of these will work on a ds3100 because they depend on
- /sprite/lib/$TM.md/diff3. This is one of those cases where unix has
- a shell script front end to the real program. Our diff3 is from GNU and
- doesn't have the back end. Merge uses the backend directly so we're
- hosed. I don't see why merge can't go through the front end -- I'll
- look into it.
-
-
- 281.
- Date: Thu, 17 Aug 89 14:10:39 PDT
- From: pmchen (Peter M. Chen)
- Subject: using news to send mail
-
- Doesn't change the machine name to sprite. I guess it doesn't use the
- same sendmail program. E.g.
-
- From: pmchen@mustard.Berkeley.EDU (Peter M. Chen)
-
-
- 282.
- Date: Thu, 17 Aug 89 14:49:47 PDT
- From: pmchen (Peter M. Chen)
- Message-Id: <8908172149.AA339000@sprite.Berkeley.EDU>
- To: bugs
- Subject: program running when sun4 crashed
-
- I was on raid, running a program which started up a lot of processes talking
- to one disk, and it crashed (see Rich Drewes's soon to be ensuing message,
- or previous message, depending on who mails first).
-
- I was running the following program:
- mult4 /dev/rsvj00 600000 type/1 size/1 0 20 20 0 0 10
-
- I've run the same program other times without crashing. One stress on the
- system might be the number of processes (20) forked off.
-
- We'll try to repeat the crash...more later
-
-
- 283.
- Date: Thu, 17 Aug 89 14:57:58 PDT
- From: drewes (Richard Drewes)
- Subject: Sun 4 bug
-
- hi hi hi,
-
- raid, a Sun 4 gets occasional hard crashes that necessitate a power cycle
- (watchdog reset results in a permanently blank screen). The console error
- message is:
-
- MachPageFault: Bus error in user proc 31e12, PC = 9424, addr = 4 BR Reg 80
- Fatal Error: Mem_Free: storage block already free
- Entering debugger with a Interrupt Trap (16) exception at PC 0xf607e6f0
-
- Peter Chen is sending you the code that generated the error.
-
- Another, possibly related error I have encountered is not quite as fatal: it
- just prints a segmentation fault sometimes when I manipulate large blocks
- of malloced data (like 100KB). Thanks for your attention, O Sprite God.
-
-
- 284.
- Date: Thu, 17 Aug 89 19:47:03 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: problem with sun4.md/machConst.h
-
-
- The sun4 version of machConst.h redefines a bunch of sys variables
- (like SYS_NUM_SYSCALLS). This isn't very convenient for adding new
- system calls.
-
-
- 285.
- Subject: rcs check-out error
- Date: Thu, 17 Aug 89 22:42:22 PDT
- From: Fred Douglis <douglis>
-
- i tried to check out mach/ds3100.md/machAsm.s. I got:
-
- co -l machAsm.s
- RCS/machAsm.s,v --> machAsm.s
- revision 1.5 (locked)
- co error: Can't check-out new copy of machAsm.s. Old copy saved.
-
- Since i was replacing the file anyway, i moved the RCS file to
- machAsm.s.bak,v and then just recreated machAsm.s with the copy mike
- sent me. so much for source control...
-
-
- 286.
- Date: Fri, 18 Aug 89 09:25:52 PDT
- From: mendel (Mendel Rosenblum)
- Message-Id: <8908181625.AA332066@sprite.Berkeley.EDU>
- To: bugs
- Subject: mkmf defaults problem
-
- It use to be that when you typed mkmf and had only one ".md" directory that
- machine type would be the default. Someone has changes this. Now it
- sets TM to default to $MACHINE. When I type pmake on the sun3 with only
- a sun4.md directory I get the following output:
-
- murder% pmake
- --- sun3.md/fs.new.o ---
- rm -f sun3.md/fs.new.o
- ld -r -o sun3.md/fs.new.o
- ld: no input files
- *** Error code 1
- pmake: 1 error
-
- I like the way it was before better.
-
-
- 287.
- Subject: ds3100 crash starting recovery
- Date: Fri, 18 Aug 89 14:53:48 PDT
- From: Fred Douglis <douglis>
-
- the moment kvetching enabled RPCs it died by jumping to pc 0. This
- was after printing that it was starting recovery with mint. Looks
- like it got something bad from mint that it didn't protect itself
- against when going through its jump tables.
-
-
- 288.
- Subject: Re: ds3100 crash starting recovery
- Date: Fri, 18 Aug 89 15:01:02 PDT
- From: Fred Douglis <douglis>
-
- Actually, a more precise description of the bug, now that i realize
- what happened. I had a "cat /hosts/kvetching/dev/syslog" running on
- mint in order to tweak recovery when kvetching was down. The crash
- was repeatable when the cat was running, and kvetching booted just
- fine once i killed the cat process.
-
-
- 289.
- Date: Fri, 18 Aug 89 15:43:55 PDT
- From: ouster (John Ousterhout)
- Subject: Bug: processes not dying
-
- I've been having a lot of trouble lately with processes not dying,
- either when I type "kill" to gdb, or when gdb exits. About half
- the time gdb just hangs until I type "killdebug" in another window
- (thank-you Ken for this convenience). In the past the processes
- have occasionally not died, but it's never hung gdb like this before.
-
-
- 290.
- Date: Sat, 19 Aug 89 10:24:58 PDT
- From: ouster (John Ousterhout)
- Subject: Piracy in debugger again
-
- I'm beginning to wonder if maybe something is wrong with Piracy, since
- it ends up in the debugger so much more often than other DS3100's, even
- though I'm not actually using it. Right now it's in the debugger with
- the message
- "Bad kernel TLB Fault
- Syncing disks ...
- Entering debugger with a TLB LD miss exception at PC 0x8"
-
-
- 291.
- Date: Sat, 19 Aug 89 10:53:21 PDT
- From: gibson (Garth Gibson)
- Subject: MachTrap in tx on default kernel (Brent sun3) (8 Jul 89 18:49:45)
-
- I was running vi in a tx window this morning (on the oldest kernel
- I can find - the one that generally runs forever) and the tx process
- took a bus error:
- MachTrap: Bus error in user proc 4051f, PC = dad4, addr = 2a2f0a84 BR Reg 0
- garth
-
-
- 292.
- Date: Sun, 20 Aug 89 10:39:14 PDT
- From: brent (Brent Welch)
- Subject: X on ds3100
-
- I tried to use cardamom today, Sunday. After finally finding
- /ultrix/cmds/Xmfb I invoked it via xinit.
-
- xinit tx -D -title Console -e ~/bin/xstart sprite:0 -- /ultrix/cmds/Xmfb
-
- The backgroud pattern appeared for about two seconds and then the
- screen went blank. I am currently logged into cardamom and see
- no trace of xinit or Xmfb, but I can'T use the screen. Is this
- a case of not being able to restart X because of an interaction
- with the ipServer?
- By the way, what is the one true way of starting X on
- a ds3100? Why isn't it easy to figure out? Also, the xinit
- I started was probably /X/cmds.ds3100/xinit, not the one
- in /ultrix/cmds.
-
-
- 293.
- Date: Mon, 21 Aug 89 10:09:37 PDT
- From: mendel (Mendel Rosenblum)
- Subject: loadavg error messages
-
- I've been getting messages of the form:
-
- <27>Aug 21 10:07:20 loadavg[11118]: Error evicting foreign processes: an argumen
- t to a call was invalid
-
- on murder. The kernel is:
- SPRITE VERSION 1.0 (JohnH sun3) (11 Aug 89 17:57:30)
-
-
- 294.
- Date: Mon, 21 Aug 89 10:10:25 PDT
- From: ouster (John Ousterhout)
- Subject: Bug in mx regexp search code
-
- If you select the last character in a file, enter a garbage string
- into the search window (one that won't match anything) and type ^B,
- the regexp code panics with "Pointer error!".
-
-
- 295.
- Subject: trashed file
- Date: Mon, 21 Aug 89 10:33:02 PDT
- From: Fred Douglis <douglis>
-
- /user1/douglis/Mail/drafts/1 should have contained a Mail draft that I
- was trying to save last night when allspice must have crashed.
- Instead, it contained something that looks like part of an mx log for
- a file called "versions". I moved it to /user1/trashed/MH-mxlog.
-
-
- 296.
- Date: Mon, 21 Aug 89 10:55:31 PDT
- From: pmchen (Peter M. Chen)
- Subject: lprm dies
-
- The printer in our office (508-5), pulla, was not printing, so I tried
- to lprm a job. lprm -Plw547 (nobody has changed the name of the printer
- from 547 to 508-5) <jobnumber> returned:
- *** compat: Invalid message # for Gen module: status = 0x4e22
- *** compat: Invalid message # for Gen module: status = 0x4e22
- socket: Can't find my hostname
-
- Debug
-
- This might be because envy is currently down and is returning a weird
- error message.
-
-
- 297.
- Subject: tx bug: large selection hung window
- Date: Mon, 21 Aug 89 11:44:43 PDT
- From: Fred Douglis <douglis>
-
- I tried using ^V to stuff a very large selection, and my tx hung.
- It's process 70216 on kvetching if someone wants to look at it (I
- threw it into the debugger).
-
-
- 298.
- Date: Mon, 21 Aug 89 13:15:01 PDT
- From: ouster (John Ousterhout)
- Subject: Bug: xinit needs to be tuned for Sprite
-
- From stolcke@icsib8.Berkeley.EDU Mon Aug 21 11:59:04 1989
- Received: from icsib.Berkeley.EDU by sprite.Berkeley.EDU (5.59/1.29)
- id AA08262; Mon, 21 Aug 89 11:59:02 PDT
- Received: from icsib8. (icsib8.Berkeley.EDU) by icsib.Berkeley.EDU (4.0/
- SMI-4.0)
- id AA00269; Mon, 21 Aug 89 11:59:13 PDT
- Received: by icsib8. (4.0/SMI-4.0)
- id AA15809; Mon, 21 Aug 89 11:59:08 PDT
- From: stolcke@icsib8.Berkeley.EDU (Andreas Stolcke)
- Message-Id: <8908211859.AA15809@icsib8.>
- To: ouster@sprite.Berkeley.EDU (John Ousterhout)
- Subject: Re: Anyone use these things?
- In-Reply-To: Your message of Fri, 18 Aug 89 15:21:57 -0700.
- <8908182221.AA203572@sprite.Berkeley.EDU>
- Date: Mon, 21 Aug 89 11:59:06 PDT
-
-
- Yes, xinit it supposed to give a basic X startup. It should also do so
- when called without any arguments. I think this currently isn't
- the case on Sprite for to reasons:
-
- xinit expects 'X' to be a link to the local X server binary, which is
- then used as the default server. So in Sprite, 'X' should probably
- point to 'Xsprite'.
-
- xinit invokes 'xterm' as the default terminal emulator client. But since
- a bunch of options go along with this a link from 'xterm' to 'tx' won't
- do.
- Off hand I can think of at least three ways of fixing this: either make
- the option handling in tx a superset of xterm's, or change the default
- command line compiled into xinit, or write up a shell script that (sort
- of) emulates xterms options calling tx.
-
-
- 299.
- Subject: access times
- Date: Mon, 21 Aug 89 13:49:51 PDT
- From: Fred Douglis <douglis>
-
- looks like access times for binaries are updated only sporadically.
- if i do an ls, and then "ls -lu /bin/ls" it looks like it is getting
- updated. but if i do some other things and then ls -lu on them, they
- aren't updated. (a particular example is
- /sprite/cmds.ds3100/mh/scan).
-
-
- 300.
- Date: Mon, 21 Aug 89 14:32:47 PDT
- From: ouster (John Ousterhout)
- Subject: Can't compile for DS3100
-
- I've been trying to recompile Mx and Tx for the ds3100, but
- I keep getting messages like "ld: Can't locate file for: -ltcl_g with -B1.31".
- Does anyone know what this error message means? The file seems to
- exist in /sprite/lib/ds3100.md/libtcl.a.
-
-
- 301.
- Date: Mon, 21 Aug 89 15:14:33 PDT
- From: brent (Brent Welch)
- Subject: Re: access times of binaries
-
- This is the situation that caused some confusion
- on Fred's part. If a program is being executed then
- the system always returns "now" as the current access
- time. This is done to avoid the overhead of contacting
- all the hosts that might be executing the program.
- However, this access time is not propogated back to
- the file descriptor (bug #1). So, if you use the ls
- program to look at the access time of the ls program,
- you'll always get "now". If you use another program,
- stat for example, you'll get the access time in the file
- descriptor. A related bug is that (I think) demand loading
- a file from a remote server might take a path that doesn't
- update the access time on the binary file. Specifically,
- FsCacheRead updates the access time, but FsCacheBlockRead
- does not. Normally FsCacheRead is called on the client
- and FsCacheBlockRead is called on the server in response
- to requests for whole blocks. For non-VM accesses FsCacheRead
- updates the access time at the client cache, and this eventually
- gets back to the file server. However, VM uses Fs_PageRead
- and the object-specific BlockRead routines, which do not
- properly set the access time on the client.
-
-
- 302.
- Date: Mon, 21 Aug 89 17:43:50 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: L1 keys on sun4
-
- A while back there were some complaints about the L1 keys not working at times
- on the sun4. It turns out that some people's mainHook.c files (not mine,
- of course, or I would have experienced the same problem!) set main_DoDumpInit
- to FALSE. If it is false, the routines for the different L1 keys are not
- initialized. I don't know why this is a variable, or why anyone would want
- to set it to false, but this explains why various people's sun4 kernels had
- this trouble.
-
- Another complaint is that L1A won't work sometimes. If the machine has wedged
- itself at a time when interrupts were off, then nothing from the keyboard
- will work. At that point, you need to watchdog reset it or the equivalent.
- If a machine does this, this is a bug, since it should not have wedged itself,
- naturally. This can happen easily if you are debugging a sun4 kernel and the
- debugger protocol messes up and it starts timing out. Fixing the debugger will
- help, and figuring out a way to re-enable keyboard interrupts in the debugger
- will also help. Both should happen eventually.
-
-
- 303.
- Date: Mon, 21 Aug 89 21:38:04 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: bug in mx
-
- I'm running the new mx (Aug 21 14:34) on a ds3100 running kernel 1.002
- and every time I select something and type ^B I get the message
- "Pointer error!" in the shell that started the mx, and the mx goes into
- the debugger.
-
-
- 304.
- Date: Tue, 22 Aug 89 10:17:22 PDT
- From: mendel (Mendel Rosenblum)
- Subject: Re: bug in mx
-
- > I'm running the new mx (Aug 21 14:34) on a ds3100 running kernel 1.002
- > and every time I select something and type ^B I get the message
- > "Pointer error!" in the shell that started the mx, and the mx goes into
- > the debugger.
-
- Tx does the same thing when you do a meta-b.
-
-
- 305.
- Date: Tue, 22 Aug 89 12:21:32 PDT
- From: gibson (Garth Gibson)
- Message-Id: <8908221921.AA267821@sprite.Berkeley.EDU>
- To: bugs
- Subject: ds3100: spritemon
-
- spritemon with no args works but:
- spritemon -ufv%iH 35
- Bad user TLB fault in process 31619: pc=401904 addr=4
-
- Segmentation violation
-
-
- 306.
- Subject: bug: ds3100 tx garbage pointer
- Date: Tue, 22 Aug 89 12:45:15 PDT
- From: Fred Douglis <douglis>
-
- I was trying to select something and hit the debugger with the following
- stack. mxwPtr is garbage.
-
- > 0 CharToLine(mxwPtr = 0x205d676e, position = (...)) ["mxDisplay.c":888, 0x417
- 1d8]
- 1 MxRedisplayRange(mxwPtr = 0x205d676e, first = (...), last = (...)) ["mxDisp
- lay.c":1270, 0x417ebc]
- 2 MxHighlightSetRange(hlPtr = 0x1002eb20, first = (...), last = (...)) ["mxHi
- ghlight.c":234, 0x414b5c]
- 3 MxMarkParens(fileInfoPtr = 0x1001b640, position = (...)) ["mxCmdUtils.c":58
- 6, 0x413014]
- 4 .block79 ["mxCmdUtils.c":778, 0x4135d0]
- 5 MxMouseProc(mxwPtr = 0x10025168, eventPtr = 0x7fdff818) ["mxCmdUtils.c":778
- , 0x4135d0]
- 6 Sx_HandleEvent(eventPtr = 0x7fdff818) ["sxDispatch.c":442, 0x42032c]
- 7 Tx_WindowEventProc(display = 0x10017bb8) ["txWindow.c":1240, 0x403e74]
- 8 .block249 ["fsDispatch.c":328, 0x44bf40]
- 9 Fs_Dispatch() ["fsDispatch.c":328, 0x44bf40]
- 10 .block1 ["tx.c":135, 0x4004cc]
- 11 main(argc = 9, argv = 0x7fdffa14) ["tx.c":135, 0x4004cc]
-
-
- 307.
- Date: Tue, 22 Aug 89 12:47:18 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: bug in mv
-
- There is a bug moving a file to a symbolic link to itself. For example,
- I created the file /tmp/foo, and the symbolic link /sprite/tmp/foo ->
- /tmp/foo. I then did "mv /tmp/foo /sprite/tmp/foo". I get the
- message "mv: /tmp/foo: rename: invalid argument", and worst of all, the
- file /tmp/foo disappears.
-
-
- 308.
- Date: Tue, 22 Aug 89 13:06:39 PDT
- From: gibson (Garth Gibson)
- Subject: ds3100
-
- from vi i issued ":e ~/bin/xstart"
- and it failled with message "/sprite/cmds.sun4/csh" exec format errorNo match
-
-
- 309.
- Date: Tue, 22 Aug 89 14:06:41 PDT
- From: gibson (Garth Gibson)
- Message-Id: <8908222106.AA66877@sprite.Berkeley.EDU>
- To: bugs
- Subject: restarting x11 on the ds3100
-
- doesn't work - it goes into a loop waiting for server to start.
- although once a user is established this is not a problem, every
- newuser is going to die trying to get his x environment right
-
-
- 310.
- Subject: Re: ds3100: spritemon
- Date: Tue, 22 Aug 89 14:10:49 PDT
- From: Fred Douglis <douglis>
-
- I went up to Garth's office to see why spritemon died for him but not
- for me. It was dereferencing a null pointer to font info because the
- font it tried to open didn't exist. It should complain that it can't
- open the font. Looks like this is actually an X toolkit problem, so I
- don't know how it would be fixed or by whom....
-
-
- 311.
- Subject: nfsmount bug
- Date: Tue, 22 Aug 89 14:28:37 PDT
- From: Fred Douglis <douglis>
-
- oregano's mount of /chip went into the debugger. the gcore file is in
- /tmp/nfsmount.core.22628 if anyone wants it. not only did operations
- on chip hang, but garth said that his Mail process got hung reading
- mail on sprite.
-
-
- 312.
- Date: Tue, 22 Aug 89 14:52:16 PDT
- From: gibson (Garth Gibson)
- Subject: ds3100: X meta key develops a "lock" mode
-
- I don't know how, but I got into a state on the ds3100 in X
- where meta Press and Release events alternated each time the
- key was pressed (but didn't when it was released). Caused
- some funky behaviour when I started typing in a tx window with
- the meta key locked on!
- I tore down x, (killed the server processes, restarted the servers,
- restarted x, and it was better. Got to be those alpha particles!
- garth
-
-
- 313.
- Date: Tue, 22 Aug 89 21:02:16 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: assembler bug
-
- When assembling sparc code on a sun3, the assembler gets a bus error if you
- have a "bnz" instruction. This is a synonym for bne, and it isn't implemented,
- but this shouldn't cause a bus error. The assembler should report
- "unknown opcode" or such.
-
-
- 314.
- Date: Tue, 22 Aug 89 21:14:33 PDT
- From: gibson (Garth Gibson)
- Subject: sun3 gdb
-
- sometimes gdb on the sun3's hangs when i tell it to kill the program
- it is debugging. if i kill the process in another window, it proceeds
- just peachy keen
-
-
- 315.
- Date: Wed, 23 Aug 89 09:03:54 PDT
- From: ouster (John Ousterhout)
- Subject: No warning about disk full?
-
- I don't seem to be getting syslog warnings about disk partitions
- filling up anymore. I do get error returns in programs, such
- as "Couldn't open "sun4.md/mx": no space left in file system domain."
- But cache write-backs don't cause error messages. Is this
- intentional? I'm not sure it's good.
-
-
- 316.
- Date: Wed, 23 Aug 89 10:18:35 PDT
- From: brent (Brent Welch)
- Subject: Oregano ipServer crash
-
- Oregano's ipServer died in CallTimeoutHandler.
- Its timeoutList seemed ok, but a pointer that it used
- was bogus, readyPtr. This is an element it plucks from
- the list, so somehow it got confused.
-
-
- 317.
- Date: Wed, 23 Aug 89 13:14:25 PDT
- To: bugs
- Subject: Crashes leave display off
-
- It appears that Sprite doesn't turn on the display when it enters
- the debugger or exits to the boot ROM. This makes it somewhat harder
- to figure out what has happened when a machine crashes (Piracy always
- seems to crash when it's in screen-saver mode). I think this was a
- problem on Suns too. Seems like it ought to be easy to fix.
-
-
- 318.
- Subject: mach/ds3100.md/md.mk screwed up
- Date: Wed, 23 Aug 89 14:04:20 PDT
- From: Fred Douglis <douglis>
-
- All of its sources are for jhh.md instead of ds3100.md.
-
-
- 319.
- Subject: ds3100 doesn't sync clock
- Date: Wed, 23 Aug 89 15:32:40 PDT
- From: Fred Douglis <douglis>
-
- if a machine is in the debugger it doesn't increment its time of day
- clock, nor check it against reality. kvetching is now 15 minutes
- slow.
-
-
- 320.
- Date: Wed, 23 Aug 89 16:18:45 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: XCFLAGS funniness
-
- The lock tracing stuff wasn't getting turned on in the mem module and I
- traced it to the XCFLAGS. My kernel.mk file adds -DLOCKREG to XCFLAGS,
- and the local.mk file in mem adds -DMEM_TRACE. If I go to mem and type
- 'pmake spur' only the -DMEM_TRACE shows up, but if I type 'pmake TM=spur'
- they both do. Is there a pmake expert out there who knows why this is
- happening?
-
-
- 321.
- Subject: new hosts not being setup properly
- Date: Wed, 23 Aug 89 17:06:32 PDT
- From: Fred Douglis <douglis>
-
- for example, /hosts/{pepper,parsley,violence}/dev/syslog doesn't exist,
-
- and /hosts/pepper/dev doesn't even exist. wall complains.
-
-
- 322.
- Date: Wed, 23 Aug 89 17:13:17 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: NET_NUM_SPRITE_HOSTS is bogus
-
- The number of sprite hosts should not be a constant that is compiled into
- user-level programs. I think this should be obtained through a system
- call. This would allow us to simply restart programs when we change
- the maximum, rather than recompiling libc.a and therefore the world.
- Right now pmake is broken, making it difficult to recompile a new pmake.
-
-
- 323.
- Date: Wed, 23 Aug 89 17:17:24 PDT
- From: douglis (Fred Douglis)
- Subject: host bug
-
- Turns out it wasn't the addition of host #50 that broke loadavg, it was
- the addition of a blank line. I'm changing Host_Next to deal with it,
- but no one should insert a blank line in spritehosts until this change
- propagates to every program.
-
-
- 324.
- Date: Wed, 23 Aug 89 18:03:21 PDT
- From: brent (Brent Welch)
- Subject: Oregano deadlock
-
- Oregano ran into a deadlock having to do with a call-back to
- a client during a file remove. I thought there was only
- one place that is used to wait for call-backs to complete,
- so I stuck my timeout handler at that spot. Unfortuneatly
- this other case slipped past me, so Oregano wedged after
- it apparently dropped a "consistency completed" RPC from thyme.
- I'll fix up the code so all client callbacks are guarded
- with a timeout. By the way, it is still concievable that
- this was due to a larger, network-side deadlock problem,
- especially because Oregano's disks were filling up.
- brent
- PS. Mint was rebooted at the same time to get it running
- the latest sun3.new. Both machines had been up for almost
- 6 days!
-
-
- 325.
- Date: Wed, 23 Aug 89 19:34:09 PDT
- From: brent (Brent Welch)
- Message-Id: <8908240234.AA336200@sprite.Berkeley.EDU>
- To: bugs
- Subject: silent printer errors
-
- I still hate the printing system. I often have
- jobs that abort silently. I don't really care
- about any of the underlying problems, I just
- want a user-friendly system.
- brent
- ps. The files /sprite/spool/lpd/lw608-2/{ErrorLog,lw608-2-log}
- contain no useful information about this particular case.
-
-
- 326.
- Date: Wed, 23 Aug 89 13:28:31 PDT
- From: jhh (John H. Hartman)
- Subject: gdb died on sun4
-
- I was debugging the ipServer on allspice (kernel 1.002) and gdb died
- with the following message: "MachHandleWindowUnderflow: killing process!".
-
-
- 327.
- Subject: sun4 bug: allspice misbehaving
- Date: Thu, 24 Aug 89 10:48:21 PDT
- From: Fred Douglis <douglis>
-
- * allspice's ip server seems to crash much more often than on other
- machines.
-
- * allspice's "rup" entry truly is broken (unlike John's joke about
- mint & oregano).
-
- allspice- sun4 up 61+23:23 0.00 0.00 0.00 (idle 1+08:58:28)
-
- not only is the uptime off (which happens typically when rdate fails
- and the date isn't initialized, so /hosts/`hostname`/boottime is
- dated 1969, except that allspice's isn't), but allspice's count of
- migrated processes seems to be non-zero.
-
-
- 328.
- Subject: sun2 directories
- Date: Thu, 24 Aug 89 11:05:03 PDT
- From: Fred Douglis <douglis>
-
- someone moved (or removed) sun2.md all over the place, but at least
- some directories have not been remkmf'ed. So, "make all" generates a
- lot of complaints. Time for a world remkmf?
-
-
-
-
- 329.
- Date: Thu, 24 Aug 89 12:50:49 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Re: gdb died on sun4
-
- This isn't a bug, although the error message should be improved. This
- is what happens to user processes that mess up their stacks (unalign them or
- garbage them) and then get an underflow. It's sort of like a bus error or
- something, except that your choices of how to handle it inside an underflow
- trap are very limited. I'll make the error message more informative. This
- does prove, though, that the watchdog reset is gone, since it used to get a
- watchdog reset when it tried to print the string. I fixed that problem, which
- is why you now see this message.
-
-
- 330.
- Date: Thu, 24 Aug 89 15:14:27 PDT
- From: eklee (Edward K. Lee)
- Subject: rlogin to cory sometimes hangs
-
- Often times we can not rlogin to tonkawa nor raid even though the ipServer and
- inetd are running and the other non-Sprite machines in Cory are accessible.
- Rlogin daemons are spawned off but seem to immediately enter the DEBUG state.
- In fact, it seems like two rlogin daemons are spawned for for each rlogin
- attempt. One of the daemons enters the DEBUG state and the other enters a
- wait state.
-
-
- 331.
- Date: Thu, 24 Aug 89 15:32:47 PDT
- From: eklee (Edward K. Lee)
- Subject: ditroff on ds3100
-
- Running ditroff on a ds3100 results in:
- Bad user TLB fault in process 22b32: pc=40f6b4 addr=ffff436
- being printed to the sylog and the process hanging.
-
-
- 332.
- Date: Thu, 24 Aug 89 16:39:47 PDT
- From: brent (Brent Welch)
- Subject: Attributes and devices
-
- Attribute handling is still not perfectly implemented.
- This summarizes what happens and what ought to change.
-
- 1 - If you stat() a file that is being executing, the
- kernel reports that the access time is "now".
- This time does not get propagated back to the
- file descriptor on disk, so the access time
- can appear to change
-
- 2 - While the device I/O servers maintain an access and
- modify time, this is not pushed back to the
- file descriptor. This means that only activity
- on mint's console will be remembered (maybe)
-
- 3 - Clients do not set the access and modify time when
- a file is created. The file server's time is used.
- A client does set a modify time when it closes
- a file, but the server will set the modify time
- of a write-through (non-cachable) file. The fix
- to this requires changing the RPC parameters to
- OPEN and WRITE to include a modify time, and to
- add an access time to the READ RPC parameters.
-
-
-
- 333.
- Date: Thu, 24 Aug 89 16:42:24 PDT
- From: brent (Brent Welch)
- Subject: Symbolic link format
-
- Sprite adds a null to the end of the file name stored
- in a symbolic link, while Unix does not.
- Also, there is no domain-specific SYMLINK operation.
- Instead, a symbolic link is created (a la mknod),
- and then a value is written using the domain-specific
- WRITE procedure. This means that you can create
- a Sprite-format symbolic link on a Unix file server
- via nfsmount, oops. This also means you can create
- zero-length symbolic links if the disk is full.
-
-
- 334.
- Date: Thu, 24 Aug 89 16:43:53 PDT
- From: brent (Brent Welch)
- Subject: Removes with disk full
-
- The file servers do not behave well when the disk fills up.
- In particular, removes seem to fail, or at least hang.
- I suspect that the cache gets completely dirty so that
- indirect blocks cannot be read in, and this hangs the
- remove which needs to read the indirect block.
-
-
- 335.
- Date: Thu, 24 Aug 89 16:45:25 PDT
- From: brent (Brent Welch)
- Subject: pseudo-device pointer bug
-
- On the ds3100 and sun4 machines there are occasional
- pseudo-device pointer errors. The firstByte index
- into the request buffer is not pointing to the
- required magic value. There is probably a bug
- relating to rounding sizes up to 4-byte boundaries.
- This is killing ipServer and rlogind processes.
-
-
- 336.
- Date: Thu, 24 Aug 89 16:46:12 PDT
- From: brent (Brent Welch)
- Subject: chmod symbolic link loop
-
- chmod 755 /sprite/src/kernel
- generates the error: too many levels of symbolic link
- while
- chmod 755 /sprite/src/kernel/
- works ok.
-
-
- 337.
- Date: Thu, 24 Aug 89 16:55:30 PDT
- From: jhh (John H. Hartman)
- Subject: malloc semantics
-
- Malloc() should return NULL if more memory cannot be allocated. The current
- behavior is to kill the process. A variable should be provided that allows
- a process to make malloc behave in either fashion.
-
-
- 338.
- Date: Thu, 24 Aug 89 17:20:48 PDT
- From: brent (Brent Welch)
- Subject: migration offset
-
- The stream offset is probably being screwed up during migration.
- This can explain the problems with pmake's shell scripts
- getting apparently garbled.
-
-
- 339.
- Date: Thu, 24 Aug 89 17:22:18 PDT
- From: brent (Brent Welch)
- Subject: mail with no /tmp
-
- The previous empty mail message was generated when /tmp
- was down. I'm not sure this is worth trying to fix.
- However, my mail session looked like:
-
- <sage 208> mail bugs
- Subject: migration offset
- The stream offset is probably being screwed up during migration.
- This can explain the problems with pmake's shell scripts
- getting apparently garbled.
-
-
- 340.
- Date: Tue, 29 Aug 89 08:31:38 PDT
- From: ouster (John Ousterhout)
- Subject: Bug in wall
-
- Brent's wall message about Oregano going down did not ever appear
- on Mace's syslog (but I saw it on Piracy's console). Furthermore,
- the test wall message yesterday had the same behavior. For some
- reason wall must be stopping part-way through the list of hosts
- (an error of some sort?).
-
-
- 341.
- Date: Tue, 29 Aug 89 08:43:12 PDT
- From: brent (Brent Welch)
- Subject: syslog reopening
-
- Johns message about a bug in wall is really about
- a bug in reopening /dev/syslog. Mendel noticed
- yesterday that after he rebooted he couldn't
- cat /dev/syslog. This was due to a bug in the
- /dev/syslog reopen procedure. It always thought
- the device was being reopened for reading, which
- breaks things because it is a single-reader device.
- After a reopen, /dev/syslog could never be opened
- for reading. I've fixed this in my kernel (BW.106)
- and will install a new dev module.
- brent
- ps. (BW.106 is a sun3 kernel)
- pps. This also works around the bug described in #147 "device reopen bug"
-
-
- 342.
- Date: Tue, 29 Aug 89 09:31:14 PDT
- From: Fred Douglis <douglis>
- Subject: Re: syslog reopening
-
- if wall only made it part-way through, it could be related to the hanging
- rlogin pdev problem I reported yesterday. wall didn't used to try and
- open rlogins, which was a problem as well. by the way, brent, that explains
- the problem with murder: wall used to have a bug in which it wouldn't close
- any of its streams until it exited, so if it hung up in an unkillable
- state trying to open a pdev it would have references on all the syslogs
- it ever opened. i already fixed that and it's in the installed version.
-
-
- 343.
- Date: Tue, 29 Aug 89 10:42:23 PDT
- From: ouster (John Ousterhout)
- Subject: Piracy crash again
-
- Piracy crashed just now as I was attempting to rlogin from mace.
- The console message is:
-
- Bad kernel TLB fault
- Syncing disks. Version: SPRITE VERSION 1.002 (ds3100) (20 Aug 89 18:20:10)
- Entering debugger with a TLB LD miss exception at PC 0x800ab804
-
- I'l leave the corpse around in case anyone wants to take a look
- at it.
-
-
- 344.
- Date: Tue, 29 Aug 89 12:16:17 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 rpn hex broken
-
- printing large numbers using rpn prints 7fffffff instead of 8nnnnnnn.
- BTW, Mike says this is broken under ultrix as well as sprite.
-
-
- 345.
- Date: Tue, 29 Aug 89 12:32:16 PDT
- From: Fred Douglis <douglis>
- Subject: Re: Piracy crash again
-
- i debugged it, then talk to mike. unfortunately, he only found what i
- found: that the tlb fault happened when a load used a register that
- had a zero value, except that register was the target of an add of
- non-zero values the previous instruction. Although the status
- register indicated interrupts were off, it's just too suspicious. I
- suggested that we put in a mousetrap to check for mach_NumDisableIntrs
- > 0 || sys_AtInterruptLevel when taking an interrupt. Mike: have you
- already made this change, or should I make a stab at it? (I'm a bit
- worried about using the wrong registers at the wrong time, which is
- why I ask.)
-
-
- 346.
- Date: Tue, 29 Aug 89 14:24:37 PDT
- From: rab (Robert A. Bruce)
- Subject: swap
-
- /sprite/lib/c/net/swap.c is screwed up. If the host machine is
- little-endian all the byte swapping appears to be correct. But if
- the host is big-endian the routines all return random garbage off
- of the stack.
-
- For big-endian machines the net swap routines should a macros that
- perform a nop, but if someone fails to include the header file the
- routines should still work correctly.
-
- A second problem is that there is no RCS file for swap.c.
-
- I will fix the swap routines and install the new version.
-
-
- 347.
- Date: Tue, 29 Aug 89 14:27:14 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Allspice crash
-
- Allspice crashed for the second time with a level 15 interrupt (asynchronous
- cache write-back error). This is totally disgusting, because the address
- it was trying to write back to was bogus (in the middle of the hole in
- the virtual address space). I don't yet even know how you could get an address
- like that into the cache in the first place. I'm currently investigating this.
- It's one bit different from a valid address in the intel page, but that's marked
- as non-cacheable.
-
- Anyway, if allspice crashes again with this, and I'm not here, could whoever
- debugs it please record the value of the global registers for me? Thanks.
- They contain interesting information on this kind of error.
-
-
- 348.
- Date: Tue, 29 Aug 89 15:16:16 PDT
- From: rab (Robert A. Bruce)
- Subject: readdir
-
- There was a bug in readdir() that caused it to swap bytes incorrectly.
- Because of this, programs that use readdir did not work correctly
- when accessing disks mounted on little-endian machines.
-
- I fixed the bug and I have recompiled `ls', `sh' and `csh', so they
- work correctly now. Other programs that use readdir still need to
- be recompiled.
-
-
- 349.
- Date: Tue, 29 Aug 89 15:31:06 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: thyme won't boot
-
- Thyme won't boot because mint won't answer it's broadcast for "/".
- Last Friday I changed the routing between mint and thyme to use
- the IP protocol in an attempt to debug what was happening over in
- cory. When I was done I ran 'netroute -f /etc/spritehosts' on
- mint, but this didn't fix things. 'netroute -p' prints out the
- correct routing information for thyme. 'rpcstat -trace' shows that
- mint thought it responded to the request. 'etherfind' does not show
- mint sending any IP packets.
-
-
- 350.
- Date: Tue, 29 Aug 89 16:23:06 PDT
- From: Fred Douglis <douglis>
- Subject: /tmp trashed??
-
- try doing an "ls /c/tmp"
-
- on sun4s i get a segv. on a ds3100 i get "assertion failed: line 49
- of readdir.c". on sun3s i get "dp->d_namlen <= 255" explicitly stated
- as an assertion failure.
-
- time to reboot oregano and check its disks???
-
-
- 351.
- Date: Tue, 29 Aug 89 16:25:30 PDT
- From: brent (Brent Welch)
- Subject: sun4 compiler
-
- The sun4 cc1.space dies with
- error: ldexp
- when compiling in my ~brent/idleTime directory.
- I have had other successes with the sun4 compiler, however,
- so I encourage people to still try compiling things.
- brent
- ps. The file it fails on is print.c
-
-
- 352.
- Date: Tue, 29 Aug 89 16:25:41 PDT
- From: Fred Douglis <douglis>
- Subject: sun4 rdist missing
-
- the program doesn't exist. rdist.prog/sun4.md was empty except for
- md.mk, and when i tried to do mkmf, the makedepend went into an
- infinite loop.
-
-
-
- 353.
- Date: Tue, 29 Aug 89 16:31:00 PDT
- From: brent (Brent Welch)
- Subject: Re: ls problems in /tmp
-
- A new ls was installed today.
- It probably can't choke down something
- in /tmp. I don't think the directory is messed up.
- Let us debug ls first.
-
-
- 354.
- Date: Tue, 29 Aug 89 18:04:47 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: prof file open bug
-
- Did I already report this? The prof module was opening the dump output file
- without the truncate flag, so crud could be left at the end of the file
- that gprof would die on. It's been fixed. Next time we do an install,
- everyone will see the fix.
-
-
-
- 355.
- Date: Tue, 29 Aug 89 18:22:50 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: tftp daemon problem?
-
- I was unable to reboot anise because the tftp daemon wasn't running on mint.
- There was no daemon in the debugger, though. Would it just exit, or did
- somebody kill it?
-
-
-
- 356.
- Date: Tue, 29 Aug 89 18:30:47 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: newly installed sun4 csh broken
-
- The newly-installed sun4 csh is broken. It dies when you try to login to a
- sun4, because it has a bad stack pointer. I would back it out, but it appears
- that whoever installed it overwrote the backup csh in the cmds.old area with the
- csh that causes ls to die on command completion sometimes. So, I guess I'll
- move csh to csh.bad and put a copy of the older bad csh in cmds.sun4 as the
- current csh. This at least will allow you to login. It's a pity that the
- person who installed it didn't try it out before installing it. With something
- as major as csh, this might be a good idea?
-
-
-
- 357.
- Date: Tue, 29 Aug 89 18:36:17 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Ugh, it's my fault
-
- Well, it seems I've done something totally bizarre to anise. The sun4 csh
- works just fine on allspice. I'll do some debugging and eventually maybe
- be able to remove my foot from my mouth.
-
-
- 358.
- Date: Wed, 30 Aug 89 09:47:16 PDT
- From: brent (Brent Welch)
- Subject: mint boots sprite on homer
-
- The folks in 608-1 complained that homer was running Sprite.
- Indeed, mint was beating ginger to the punch and supplying
- it with a Sprite kernel. How do we enforce control over
- tftp booting? Only with the symbolic links set up in
- /sprite/boot? If so, we should be careful about setting
- up links for not-normally-sprite-hosts in /sprite/boot.
- For now, I'm booting homer with
- ie(0,961c)
- to force it to run UNIX
-
-
- 359.
- Date: Wed, 30 Aug 89 10:33:55 PDT
- From: brent (Brent Welch)
- Subject: /sprite/boot up-to-date
-
- I went through /sprite/boot and removed a few symbolic
- links that correspond to machines no longer running sprite.
- This includes:
- homer (128.32.150.50 a.k.a. 80209632)
- turmeric (128.32.150.37 a.k.a. 80209625)
- bay (128.32.150.18 a.k.a. 80209612)
- tully (128.32.150.44 a.k.a. 8020962C)
- I also see a link for 80209C68.SUN4, which is an unused
- address on the 156 net (cory). This is probably for
- raid, but raid isn't in the host tables I see.
- There is also a link for 80209c95, which corresponds
- to ponca, except that the 'c' probably needs to
- be capatalized, and I don't know if tftp booting
- works through the gateway(s) or not.
-
-
- 360.
- Date: Wed, 30 Aug 89 10:36:09 PDT
- From: Fred Douglis <douglis>
- Subject: Re: /sprite/boot up-to-date
-
- this is rather awkward, since we might occasionally want to boot
- sprite on different hosts. perhaps we could have a script like the
- one on ultrix to add/remove hosts automatically.
-
-
-
- 361.
- Date: Wed, 30 Aug 89 12:49:20 PDT
- From: Fred Douglis <douglis>
- Subject: new xdvi installed... color support, but doesn't run native on ds3100
-
- I picked up some patches from comp.sources.x for xdvi. It now runs on
- a sun3 using a color ds3100 display (it already worked for B&W).
- However, it doesn't run native on a ds3100 -- I presume it has
- byte-ordering problems. I'm inclined not to fix it, since I imagine
- someone else will as there have been regular updates.
-
- If I broke anything else with the new install, let me know.
-
-
-
- 362.
- Date: Thu, 31 Aug 89 00:01:41 PDT
- From: Fred Douglis <douglis>
- Subject: Re: xgone complaint
-
- well, the default is to prompt for a password, i guess. actually,
- i thought i'd changed that, but maybe not. i'll check. in any case,
- there's an option to disable it, and you should be able to rlogin and
- kill it, can't you?
-
-
-
- 363.
- Date: Wed, 30 Aug 89 14:19:07 PDT
- From: Fred Douglis <douglis>
- Subject: another ds3100 crash (malloc)
-
- piracy died with a bogus value (0x54) in its freelist. nothing too
- terribly obvious, except i did notice that cardamom had recently
- rebooted and it was in a migration-related call for something with
- home node cardamom. does anyone know what cardamom was doing when it
- was rebooted? i wonder if something got freed too soon, or something.
-
- p.s. dave culler said that piquante was just as unstable running
- ultrix as running sprite: xterm would die about once/day and the
- kernel itself would crash periodically.
-
-
-
- 364.
- Date: Wed, 30 Aug 89 17:19:46 PDT
- From: Fred Douglis <douglis>
- Subject: assault hardware problem?
-
- for the record: assault died a little while ago with something called
- a "bus error". however, the address was 0xc0c019d4, and 0xc0c019d0
- and d8 were perfectly valid. apparently, a bus timeout can occur on a
- parity error, which is a possible cause of assault's problem.
-
-
-
- 365.
- Date: Wed, 30 Aug 89 22:26:27 PDT
- From: jhh (John H. Hartman)
- Subject: xgone complaint
-
-
- Several times I have been confronted by machines running xgone that
- insist I type in the password for the person that started it running.
- It would be nice if xgone could be killed or the password feature
- disabled.
-
-
-
- 366.
- Date: Thu, 31 Aug 89 09:48:49 PDT
- From: ouster (John Ousterhout)
- Subject: Another Piracy Crash
-
- This time the message was:
-
- Fatal Error: Page number outside bounds of corePtr->virPage.page table
- Syncing disks Version: SPRITE VERSION 1.002 (ds3100) (20 Aug 88 18:20:10)
- Entering debugger with a Breakpoint trap exception at PC 0x800bc6d8
-
- The corpse is available for debugging.
-
-
- 367.
- Date: Thu, 31 Aug 89 11:02:48 PDT
- From: Fred Douglis <douglis>
- Subject: Re: Another Piracy Crash
-
- this is similar to a crash i looked at before: Vm_Clock was trying to
- clean a page that didn't belong in the segment it pointed to. The
- segment was an inactive code segment with 17 pages (16 resident), and
- corePtr->virtPage referenced page 1037. so for example, if the
- virtPage page number had an extra bit set accidentally, it could have
- really meant to reference 0xd instead of 0x40d and it would be a
- perfectly reasonable page. Just a thought...
-
-
-
- 368.
- Date: Thu, 31 Aug 89 11:20:10 PDT
- From: brent (Brent Welch)
- Subject: Too many system calls
-
- The "Too many system calls" should not be a panic, I think,
- because the problem occurs very early during bootstrap.
- Can't it just print out a warning and ignore the
- rest of the kernel calls?
-
-
-
- Date: Thu, 31 Aug 89 11:52:55 PDT
- From: ouster (John Ousterhout)
- Subject: The slows
-
- Something related to Sprite seems to have "the slows" this morning.
- I suspect Allspice, because that's where the files are that I'm
- compiling. The symptoms are that a compile takes a VERY long time,
- and the status line printed afterwards shows only 10-15% utilization
- of the CPU. Also, I've noticed occasional RPC timeout messages about
- allspice. The problem has come and gone a couple of times this
- morning.
-
-
-
- 370.
- Date: Thu, 31 Aug 89 12:21:49 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 ultrix weirdness
-
- a comment on *ultrix* weirdness. (i asked david if he's seen the same
- thing on piquante since it switched to sprite.)
-
- ------- Forwarded Message
-
- Date: Thu, 31 Aug 89 12:15:27 -0700
- From: david@fennel.berkeley.edu (David A. Wood)
- To: root@fennel.Berkeley.EDU
- Subject: ds3100 (ultrix) NFS weirdness
-
- I have been running my cache simulator on greed and piquante and
- occasionally get some bogus results. They are very small errors, usually
- an extra line or two in the input file, but it is somewhat disconcerting.
- Has anyone else been experiencing these problems??
-
- --david
-
- ------- End of Forwarded Message
-
-
-
- 371.
- Date: Thu, 31 Aug 89 12:30:10 PDT
- From: Fred Douglis <douglis>
- Subject: page-in error can kill kernel
-
- paprika crashed yesterday with a bus error. turns out Proc_Exec made
- an argument array accessible, then hit a bus error referencing it.
- this was about the time that allspice crashed, i got an "Fs_PageRead
- waiting" message, and i hit ^C to interrupt the exec. looks like
- Vm_MakeAccessible needs to lock down the page rather than relying on
- the same Vm_Copy check, since an error on page in has a choice of
- killing the kernel or returning something that will not be passed back
- to the routine accessing the data. at least, a page-in error is the
- only thing I can think of to account for the kernel dying.
- Suggestions from the VM experts??
-
-
- 372.
- Date: Thu, 31 Aug 89 12:13:37 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Re: The slows
-
- Yesterday when I went to check on allspice's slowness, messages on the
- console showed it had been blasted with rpc version mismatches. This happened,
- it seemed, just when assault was booted (and unbooted quickly, since it
- died real soon). I reset the network interface and this helped to some extent,
- since mint could talk to allspice again where it hadn't just before. Maybe
- something worse is going on. Whatever it is, it only seems to pick on certain
- client machines at a time.
-
-
-
- 373.
- Date: Thu, 31 Aug 89 12:38:52 PDT
- From: Fred Douglis <douglis>
- Subject: Re: Too many system calls
-
- I've changed *.md/machCode.c to handle inconsistencies a little
- better. It prints warnings for too many system calls (ignoring the
- extra) or too many arguments (ditto). It also prints a warning for
- out-of-order call initialization. Mendel says that should be a panic
- still, but the problem (as we've seen) is that it's too early to
- panic. I'm open to suggestions.
-
- I'm recompiling now.
-
-
-
- 374.
- Date: Thu, 31 Aug 89 13:50:32 PDT
- From: mendel (Mendel Rosenblum)
- Subject: sprintf man page incorrect
-
- The man page for sprintf says:
-
- RETURN VALUE
- The functions all return the number of characters printed,
- or -1 if an error occurred.
-
- This is incorrect.
-
-
-
- 375.
- Date: Thu, 31 Aug 89 17:08:03 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: rcp hangs on ds3100
-
-
- Rpc of a kernel from a ds3100 to dill hangs.
-
-
- 376.
- Date: Fri, 01 Sep 89 00:14:35 PDT
- From: rab (Robert A. Bruce)
- Subject: Allspice crashed
-
- Allspice crashed with the following error:
-
- Fatal Error: Page number outside bounds of pagetable
- Entering debugger with a Interrupt trap (16) exception at PC 0xf6081320
-
- Jhh tried to debug it but couldn't because it was running sun4 instead
- of sun4.new and the sources aren't available.
-
- The bug is repeatable by running gdb and trying to stepi through
- an instruction that destroys the stack pointer.
-
-
-
- 377.
- Date: Thu, 31 Aug 89 20:10:27 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: printenv doesn't take arguments
-
- On unix machines such as rosemary, printenv will take arguments so that you
- can say
- "printenv TERM"
- and get the answer
- "tx"
- rather than the answer
- "printenv doesn't take any arguments; "TERM .." ignored."
- and then your whole environment.
-
-
-
- 378.
- Date: Thu, 24 Aug 89 17:22:18 PDT
- From: brent (Brent Welch)
- Subject: mail with no /tmp
-
- The previous empty mail message was generated when /tmp
- was down. I'm not sure this is worth trying to fix.
- However, my mail session looked like:
-
- <sage 208> mail bugs
- Subject: migration offset
- The stream offset is probably being screwed up during migration.
- This can explain the problems with pmake's shell scripts
- getting apparently garbled.
-
-
-
- 379.
- Date: Thu, 24 Aug 89 17:24:40 PDT
- From: brent (Brent Welch)
- Subject: mail with no /tmp
-
- (This superceeds the previous message.)
- I tried to send mail while oregano was down.
- After I ended the mail session by typing
- . (on a line by itself :)
- I got:
- EOT
- Null message body; hope that's ok
- read: stale remote file handle
-
- And an empty message with no subject line
- was generated.
- brent
-
-
- 380.
- Date: Thu, 24 Aug 89 17:51:22 PDT
- From: brent (Brent Welch)
- Subject: Oregano hung for 5 minutes
-
- My consistency timeout kicked in today.
- The timeout period is 5 miniutes in order to
- allow a client with a large dirty cache
- plenty of time for a write-back. However,
- 5 minutes is enough time for everyone to
- think there is a major problem. I almost
- had Oregano in the debugger when the timeout
- message appeared on the console and things
- fixed themselves up quite nicely. How about
- a shorter timeout?
-
-
- 381.
- Date: Thu, 24 Aug 89 21:25:49 PDT
- From: rab (Robert A. Bruce)
- Subject: mkmf
-
- I tried to re-mkmf the library directory but mkmf generated bogus
- makefiles. Make issues the following complaints:
-
- "Makefile", line 29: Undefined variable "$ "
- "/sprite/lib/pmake/biglib.mk", line 64: Undefined variable "$ "
- ...
- "/sprite/lib/pmake/tm.mk", line 23: Undefined variable "$ "
- ...
-
- The offending line in the Makefile is:
-
- TM ?= $ {defTarget:q}
-
- At first I thought that there was just an extra space after
- the $, but when I removed it I got these messages:
-
- pmake: Unknown modifier 'q'
- "Makefile", line 29: Undefined variable "${defTarget:q}"
- pmake: Unknown modifier 'q'
- "/sprite/lib/pmake/biglib.mk", line 64: Undefined variable "${defTarget:q}"
- pmake: Unknown modifier 'q'
- ...
-
-
-
- 382.
- Date: Thu, 24 Aug 89 21:50:16 PDT
- From: Fred Douglis <douglis>
- Subject: Re: mkmf
-
- oops. there was a typo in mkmf.biglib. the extra space was in the
- mkmf script, not in the makefile. it's fixed now.
-
-
-
- 383.
- Date: Fri, 25 Aug 89 08:40:27 PDT
- From: ouster (John Ousterhout)
- Subject: Piquante won't boot
-
- David Culler has been trying unsuccessfully to boot piquante this
- morning. After the command "boot -f tftp()", the following messages
- appear:
-
- TFTP Error: 1 (file not found)
- TFTP Error: 1 (file not found)
- TFTP Error: 1 (file not found)
- TFTP Error: 1 (file not found)
- couldn't load tftp
-
- Can someone who understands ds3100's better than I do (Bob? Fred?)
- give David a hand in getting his machine booted again? Thanks.
-
- -John-
-
- P.S. I'm wondering if the problem is a well-intentioned Ultrix
- TFTP daemon responding to the broadcast before Sprite does.
-
-
- 384.
- Date: Fri, 25 Aug 89 08:51:06 PDT
- From: Fred Douglis <douglis>
- Subject: Re: Piquante won't boot
-
- I get that any time I try to boot with tftp without saying "init" to
- the prom beforehand. Had he tried that?
-
-
-
- 385.
- Date: Fri, 25 Aug 89 08:58:18 PDT
- From: ouster (John Ousterhout)
- Subject: Re: Piquante won't boot
-
- At your suggestion I tried "init", but it didn't work. I also tried
- power-cycling the machine, which also didn't help.
-
-
- 386.
- Date: Sun, 27 Aug 89 21:21:41 PDT
- From: Fred Douglis <douglis>
- Subject: debugging hosts
-
- did anyone have a chance to poke around murder in the debugger before
- rebooting it? i need to look in the debugger any time something like
- this happens.
-
- also, it would be very useful for bug reports to say not only which
- hosts are involved with a problem, but which kernels they are running.
- Having monotonically increasing version numbers is a wonderful idea
- because it makes it much easier to identify kernels. I noticed that
- Brent set up his own directory to do something similar, so I copied
- his Makefile setup to my own; for example, right now I'm running
-
- Kernel version: SPRITE VERSION FD.001 (ds3100) (25 Aug 89 18:34:42)
-
-
-
- 387.
- Date: Fri, 25 Aug 89 10:36:09 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 stuff
-
-
- [john, sorry for the duplication due to my typo]
-
- i noticed piracy was in the debugger and tried to debug it. however,
- i couldn't find out which kernel it is running, because kmsg -v
- doesn't work, and i misguessed. you might as well reboot.
-
- also, brent and i had trouble finding the unstripped binary
- corresponding to the installed ds3100 that dave culler is running.
- turns out someone removed it or overwrote it on sprite, but i had
- copied it to dill in the form "ds3100.new" a few days ago. we really
- need to be careful about keeping debuggable versions, especially on
- dill (in /sprite/src/kernel/nelson, at the moment, which is on dill's
- local disk).
-
- finally, are the rdists of kernel sources to unix being done
- automatically, finally? dill mounts /sprite3 and i have set up the
- debugger search path to look there.
-
-
- 388.
- Date: Fri, 25 Aug 89 11:07:19 PDT
- From: culler (David Culler)
- Subject: IO error from EMACS
-
- When ``that evel editor'' (EMACS) tries to write a file to
- a pseudo-file system it gets an "IO error". Apparently this
- arises when EMACS tries to sync the file to make sure it is
- written, as the write was successful.
-
-
-
- 389.
- Date: Fri, 25 Aug 89 11:48:56 PDT
- From: ouster (John Ousterhout)
- Subject: Kernel names in /sprite/src/kernel/sprite
-
- Perhaps all this has been fixed in the recent changes, but it
- used to be that each recompilation in /sprite/src/kernel/sprite
- moved the "current" kernel (e.g. sun3) to one with a date appended
- to its name. This is all fine, except that there was no obvious
- way to tell which of the many old sun3 kernels corresponded to
- what was installed as sun3.new, or, more importantly, sun3. Hence
- at one point I accidentally removed the only unstripped copy of the
- sun3 kernel while trying to cleanup up irrelevant binaries.
-
- Does the new naming scheme make it clear which unstripped kernels
- correspond to "official" versions? If not, it would be nice if it
- did.
-
-
-
- 390.
- Date: Fri, 25 Aug 89 12:11:52 PDT
- From: rab (Robert A. Bruce)
- Subject: Re: ds3100 stuff
-
- There is a shell script in /sprite/lib/misc/distfile.kernel to
- rdist the kernel sources. I isn't run from the crontab right now
- because there is a problem. When sprite attempts to find the size
- of a file on ginger it gets the wrong size, so every file is copied
- every time.
-
- I am not sure what the problems is. I suppose we could put it in
- the crontab anyway for now. Does anybody have any ideas as to
- why the sizes are getting screwed up?
-
-
-
-
- 391.
- Date: Sun, 27 Aug 89 21:46:44 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 getting repeated floating-point interrupt in kernel
-
- Garth commented that he crashed a couple of ds3100s (pepper and
- parsley) running his simulator on them. Turns out parsley was in the
- same state as pepper, but this time ^C followed by "run" in kdbx (not
- normally needed, I thought) made me able to poke around. It was in a
- panic due to an FP interrupt in kernel mode. This happened once
- before and Mike said to let him know if it happened again, I think.
-
- i'll mail the kdbx session to Mike in case it's of use. it includes
- mach_DebugState.
-
-
-
- 392.
- Date: Fri, 25 Aug 89 12:21:41 PDT
- From: rab (Robert A. Bruce)
- Subject: unkillable process
-
- The dump died last night. When I tried to restart it I got
- this error:
-
- Can't open /hosts/murder/dev/exabyte.norewind: text file or pseudo-device busy
-
- The process that has it open is
-
- 9112c WAIT 2:04 tar ncfT - -
-
- This process completely ignores `kill -DEBUG' and `kill -KILL'.
- The process is still alive on murder if anyone wants to
- look at it.
-
-
-
- 393.
- Date: Fri, 25 Aug 89 13:25:12 PDT
- From: brent (Brent Welch)
- Subject: Re: Kernel names in /sprite/src/kernel/sprite
-
- The Makefile saves the ${TM} kernel image in ${TM}.version
- at the end of the script. It is easy to revert and
- leave the kernel in ${TM} and do the rename before you
- make the next version. We can vote on this at meeting.
-
-
-
- 394.
- Date: Fri, 25 Aug 89 14:08:43 PDT
- From: eklee (Edward K. Lee)
- Subject: gremlin
-
- I'm trying to run gremlin remotely using forgery's monitor but gremlin
- complains: "Couldn't open font file"
- I was able to run xdvi remotely.
-
-
- 395.
- Date: Fri, 25 Aug 89 15:41:14 PDT
- From: Fred Douglis <douglis>
- Subject: pmake garbling explained
-
- after looking carefully at the pmake output, we realized what was
- happening. the shell would read some commands, then suddenly start
- reading from the beginning again. We figured this had to be because
- of eviction. brent has found two shared-offset bugs so far, one for
- reading and one for writing, and i have a program that can recreate
- the problem, though only for full 4096-byte reads, not the smaller
- reads that sh does.
-
- anyway, thanks for the suggestions. my check for the file existing
- did catch the fact that there are occasionally leftover files in /tmp
- with the same processid, but as it turns out, "w+" truncates as well
- so that wasn't really the problem.
-
-
-
- 396.
- Date: Fri, 25 Aug 89 16:06:54 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: sprite rpc and gateways
-
-
- Right now it looks like the gateway between evans and cory is
- changing random words. We don't have a checksum mechanism to
- protect against this. Every time tonkawa boots at least one program
- contains an illegal instruction. As a result the spur cluster in
- Cory is unusable. I have set tonkawa up to use as much stuff off
- of its local disk as possible, but this is only a partial fix since
- some things still need to access /sprite.
-
-
-
- 397.
- Date: Fri, 25 Aug 89 18:03:16 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Question about file system cache
-
- I compiled a new test kernel for the sun4 in my kernel directory. In
- /sprite/boot I had a symbolic link to it. When I tried to reboot, tftp said
- there was no such file. But there was. It turns out the file system was
- full, although I got no write-back errors when I compiled the kernel. When I
- cleaned out some space elsewhere in the file system, tftp found the file.
-
- Shouldn't I see a message about write-backs not working? I probably don't
- understand what's going on, but I assume this all happened because the file
- was still in the client's fs cache. I guess there's nothing that can be done
- about it, but it seems a weird kind of caching to me if references to the file
- can't find what's cached for the file. Yeah, I know it's on a different
- machine, but the behavior still seems weird to me.
-
-
-
- 398.
- Date: Fri, 25 Aug 89 18:06:45 PDT
- From: Fred Douglis <douglis>
- Subject: Re: Question about file system cache
-
- As I just told Mary in person, the lack of a message is because the
- link took place on another host and the messages went to its syslog.
- We should figure out how to do something about this.
-
-
-
- 399.
- Date: Fri, 25 Aug 89 18:16:35 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: fix to #281
-
- There are new versions of rn, inews, Rnmail, and Pnews installed that
- fix bug #281 (among other things). For future reference, if someone
- installs rn and has to run the Configure script, tell it you don't
- want the programs to be portable. That will cause rn to think the host
- name is always 'sprite.berkeley.edu'.
-
-
-
- 400.
- Date: Fri, 25 Aug 89 18:46:42 PDT
- From: Fred Douglis <douglis>
- Subject: migration signal race condition (hopefully) fixed
-
- Brent reported earlier that a script he wrote to test shared offsets
- would often hang. I looked into it and found the problem was
- primarily in the check in Sig_Pause that would cause the user-level
- library to repeat Sig_Pause in the event that a migration signal was
- pending. In fact, it should only repeat if the *only* signal pending
- is migration related. In addition, while I was looking at potential
- causes, I realized there's a race condition when sending signals to a
- process that's about to migrate back home. I think I fixed that too,
- though there may still be a tiny window of vulnerability I'll have to
- investigae.
-
- Fixed in the uninstalled proc & sig for ds3100. I'll compile for the
- other machine types now.
-
-
- 401.
- Date: Sun, 27 Aug 89 12:14:58 PDT
- From: ouster (John Ousterhout)
- Subject: Migration hangup
-
- Migration seems to have caused creeping paralysis in Mace this
- morning. I ran pmake, noticed that it wasn't doing anything,
- and also noticed the following message in my syslog window:
-
- RpcDoCall: <mig command> RPC to murder is hung
-
- Sure enough, murder seemed to be dead (no response to rlogins,
- for example). However, I was unable to control-C the pmake process
- (no response in the window where I typed control-C). I then tried
- "kill -KILL" on the "sh -ev" process that was hanging during migration,
- and that just hung the shell where I typed the kill. Finally, I
- typed "kmsg -d murder" in another window, at which point the following
- messages appeared in my syslog window, and everything cleaned itself
- up:
-
- <mig command> RPC exit 0x30002
- <mig command> 8/27/89 12:11:44 murder (17) RPC timed-out
- Warning: Proc_MigrateTrap: error encountered sending encapsulated state:
- no Reply to an RPC request within a threshold time limit.
- <mig command> 8/27/89 12:11:51 murder (17) RPC timed-out
-
- At this point I continued murder and everything seems OK, at least
- for now. What I don't understand is why I had to put Murder into
- the debugger before migration cleaned itself up.
-
-
- 402.
- Date: Sun, 27 Aug 89 12:57:53 PDT
- From: ouster (John Ousterhout)
- Subject: gdb not killing process: repeatable?
-
- I think I know how to reproduce the problem where gdb hangs while
- killing a process:
-
- 1. Start up gdb on a process. Get the process running, then get
- back into gdb, say, via a breakpoint.
-
- 2. Recompile the program being debugged.
-
- 3. Now go to the gdb process and type "kill". The kill will hang
- until the process is manually killed from some other window.
-
- I've been able to make this happen repeatably (in ~ouster/mipsim).
- I suspect that it might be a bug in gdb: I also noticed that gdb
- is unhappy if you remove the executable being debugged and then try
- to kill from within gdb: I got the message
-
- /user1/ouster/mipsim/sun3.md/mipsim: no such file or directory.
-
-
-
- 403.
- Date: Sun, 27 Aug 89 14:01:26 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: bug #225 has disappeared
-
- I've checked up on the bug I reported about sun3 include paths being used
- by default for sun4 compilations. It seems to be fixed, at least in all
- the test cases I could think of, so I removed the explicit sun4 include paths
- from the library.mk files, etc.
-
-
- 404.
- Date: Sun, 27 Aug 89 18:04:31 PDT
- From: gibson (Garth Gibson)
- Subject: mkdir error message
-
- The error message generated by "mkdir dirX" in an NFS directory where
- dirX already exists is not very informative:
- *** compat: Invalid message # for Gen module: status = 0x11
- mkdir: submit: invalid argument
-
-
- 405.
- Date: Mon, 28 Aug 89 10:32:18 PDT
- From: brent (Brent Welch)
- Subject: Re: Question about file system cache
-
- If the ld of your sun4 kernel migrated to a different machine
- then the disk full messages probably appeared there. If you
- checked Oregano's console you may have seen the messages there, too.
-
- An open will fail if the last writer of the file cannot write it back.
- I think this is the best behavior. It's better to abort the open
- than to get bad data. I'm not sure what error code is returned
- in this case, and perhaps that can be fixed. Even so, I don't
- think too many programs expect a "disk full" error from open().
-
-
- 406.
- Date: Mon, 28 Aug 89 10:40:44 PDT
- From: brent (Brent Welch)
- Subject: Re: mkdir error message
-
- The problem is that nfsmount is returning a UNIX error code
- and then the compatibility library is trying to map it
- from a Sprite to a UNIX code. I'll take a look at nfsmount.
- Eventually we'll convert back to all-UNIX error codes,
- but don't hold your breath.
-
-
- 407.
- Date: Mon, 28 Aug 89 11:34:19 PDT
- From: eklee (Edward K. Lee)
- Subject: screen blanking on ds3100
-
- screen blanking does not seem to work on the ds3100.
-
-
-
- 408.
- Date: Mon, 28 Aug 89 11:36:41 PDT
- From: Fred Douglis <douglis>
- Subject: Re: screen blanking on ds3100
-
- sometimes it does, sometimes it doesn't. if you're going to be gone
- for a while, run "xgone" to make sure you have a screensaver running.
-
-
-
- 409.
- Date: Mon, 28 Aug 89 11:38:55 PDT
- From: Fred Douglis <douglis>
- Subject: Re: screen blanking on ds3100
-
- p.s. my last note was a bit terse, as i realized after i sent it.
- thanks for the report, and it's certainly something someone should
- look into at some point. i mentioned xgone as an interim solution,
- which means fixing the screensaver should be done but isn't as high a
- priority as it might otherwise be.
-
-
-
- 410.
- Date: Mon, 28 Aug 89 13:12:23 PDT
- From: Fred Douglis <douglis>
- Subject: another full fs bug
-
- I was trying to come up with a better test case for the pmake garbling
- bug (one that would demonstrate when the bug is truly fixed). I made
- the mistake of creating new files, with different $$ process ids,
- instead of reusing the same files. When /tmp filled up, and fenugreek
- tried to evict something writing to /tmp, the process froze and became
- unkillable. I wasn't aware that space was a problem, of course, since
- the message went to fenugreek and I was rlogin'ed. I went to lunch,
- and the problem resolved itself when space was freed.
-
- What happened here was that the fs callback took place with the
- process locked. I think I can fix this problem by changing migration
- not to keep the process locked while deencapsulating it, except for
- proc-related operations.
-
-
- 411.
- Date: Mon, 28 Aug 89 14:15:41 PDT
- From: Fred Douglis <douglis>
- Subject: Re: access to printer lw608-8
-
- Ann,
-
- Printing from the decstations needs work. I've found that if I send
- something, it usually complains that the daemon doesn't exist, and if
- I then print something from a sun3 both the file(s) spooled from the
- ds3100 and the new file from the sun3 get printed. For the time
- being, I'd recommend that you rlogin to a sun3 and print from there.
-
- Also, please send mail about things on sprite not working to "bugs"
- rather than "root". They then get automatically filed and indexed
- accordingly.
-
-
- 412.
- Date: Mon, 28 Aug 89 14:53:44 PDT
- From: Fred Douglis <douglis>
- Subject: fs shared offsets race condition
-
- i just talked to brent some more about the file system migration
- problem. he's fixed some bugs already and will be testing the fixes
- on murder soon. but he just came up with another pathological case we
- have to deal with. consider the following sequence of events:
-
-
- process 1 forks process 2 with shared descriptor
- descriptor is at offset *
- processes 1 and 2 are told to migrate
- process 1 gets signal, starts to be encapsulated
- process 2 does I/O using shared descriptor, offset **
- process 2 gets signal, starts to be encapsulated
- process 2 completes migration
- other host gets offset ** for descriptor
- process 1 completes migration
- other host gets [old] offset * for descriptor
-
- brent suggested that we might associate a timestamp with each
- encapsulation, so that an earlier offset couldn't overwrite a later
- one. i'm a bit worried that this might affect one symptom without
- curing the whole disease -- the idea of side-effect free, parallel
- encapsulation has me worried. if anyone has ideas of other
- pathological cases that might arise, please speak up. the design of
- the fs-migration interaction might warrant some discussion at an
- upcoming meeting.
-
-
- 413.
- Date: Mon, 28 Aug 89 15:29:51 PDT
- From: ouster (John Ousterhout)
- Subject: Re: fs shared offsets race condition
-
- It sounds to me like the problem with shared offsets is that they
- aren't handled at the right time during migration (i.e. there's
- a window of time where the offset is "neither here nor there").
- If an offset is shared, or even "possibly shared", wouldn't it be
- better to have the server take over responsibility for the offset
- at the beginning of migration rather than the end? Thus Fred's
- scenario would look like this:
-
- process 1 forks process 2 with shared descriptor
- descriptor is at offset *
- processes 1 and 2 are told to migrate
- process 1 gets signal, starts to be encapsulated
- -- during encapsulation, offset becomes shared, so server
- -- takes over responsbility for it. Server's offset = *
- process 2 does I/O using shared descriptor, offset **
- -- I/O is sent through to server, so server's offset gets
- -- updated to **
- process 2 gets signal, starts to be encapsulated
- process 2 completes migration
- -- since offset is shared, process 2's new host doesn't
- -- get offset at all.
- process 1 completes migration
- -- same as note above. In the unlikely even that process 1
- -- and process 2 are now on the same host again, so that
- -- the offset is no longer shared, the server could notify
- -- the client (during de-encapsulation) to cache the offset
- -- locally. The server would pass the client the correct
- -- offset to cache (**).
-
- Wouldn't this approach eliminate the window of vulnerability? I share
- Fred's concern that timestamps might solve one symptom while leaving
- other vulnerabilities; they smell tricky to me.
-
-
- 414.
- Date: Mon, 28 Aug 89 16:34:17 PDT
- From: Fred Douglis <douglis>
- Subject: pdev, device deadlocks; kgdb backtracing
-
- mace got doubly wedged today. first, from paprika,
-
- mig -h mace csh -c "tail -f /dev/null&"
-
- caused the tail processes on mace to become unkillable, waiting in an
- RPC back to paprika. i'll debug paprika's end as well, once it's
- available.
-
- second, mace had an rlogind process waiting for a pdev open because
- the pdev was marked busy. the rlogind was unkillable. any other
- process trying to open the /hosts/mace/rlogin1 file it got blocked on
- also was wedged and unkillable.
-
- finally, i couldn't find out that much on mace because kgdb
- backtracing broke: after switching kernel stacks and going up stack
- frames, kgdb got confused and "info reg" produced
-
- ERROR: invalid read address 0x0
-
- as did any commands to print local variables.
-
-
- 415.
- Date: Mon, 28 Aug 89 18:30:56 PDT
- From: gibson (Garth Gibson)
- Subject: Not a sprite bug, but ....
-
- This is not necessarily a Sprite bug, but Spriters may need to beware.
- I had a file on Unix that I "mx"d on Sprite through NFS. I changed much
- and increased its length substantially. I had Mx write it then tried to
- print it on rosemary. An old version was a bunch of trash at the end
- was printed. Sprite and other sun unix machines see the correct file,
- but rosemary appears to have suffered a caching problem. I should
- note that rosemary had been touching the file before and during the Mx
- session and the other unix machines did not touch it until Mx had quit.
- I should also note that I almost never use Mx across NFS (vi for NFS,
- Mx for Sprite - everything in its place). Rosemary remains confused
- about the file (even after a sync).
-
-
- 416.
- Date: Mon, 28 Aug 89 19:30:42 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Re: Not a sprite bug, but ....
-
- This problem plagued me frequently while I was doing the sun4 port working
- from rosemary. You can fix it by moving the file to a new name (on sprite)
- and touching and removing the old file name (from unix) and then moving the
- file back to its old name (from sprite) and then reaccessing it (from unix).
-
-
-
-
- 417.
- Date: Fri, 1 Sep 89 12:24:40 PDT
- From: eklee (Edward K. Lee)
- Subject: clarification on gremlin bug
-
- I was running sun3.new on mustard when I discovered the following bug.
- While running gremlin on ~eklee/raid.cont/config.grn, doing a pan (downward in
- this particular instance) gremlin crashed. Not only did gremlin crash, but
- I lost control of the mouse as well (Mustard was still up).
- Panning does not always cause gremlin to crash, but after you do several pans
- you gradually lose functionallity. The first thing to go is your snap factor.
- It becomes very large for some reason and you can not get it below a certain
- level. Next, objects are displaced haphazardly. Finally, it becomes difficult
- to control the direction and magnitude of panning.
-
-
- 418.
- Date: Fri, 1 Sep 89 17:21:53 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: mail bug
-
- When I responded to John's mail about SYS_MAX_ARGS, using the "r" command,
- the mailer changed the bugs address to the bogus address
- sprite.berkeley:bugs@edu
-
- Below is the mailer daemon report.
-
- >From mgbaker Fri Sep 1 17:11:20 1989
- Received: by sprite.Berkeley.EDU (5.59/1.29)
- id AA919609; Fri, 1 Sep 89 17:11:17 PDT
- Date: Fri, 1 Sep 89 17:11:17 PDT
- From: MAILER-DAEMON (Mail Delivery Subsystem)
- Subject: Returned mail: Host unknown
- Message-Id: <8909020011.AA919609@sprite.Berkeley.EDU>
- To: mgbaker
- Status: R
-
- ----- Transcript of session follows -----
- 550 sprite.berkeley:bugs@edu... Host unknown
-
- ----- Unsent message follows -----
- Received: by sprite.Berkeley.EDU (5.59/1.29)
- id AA919600; Fri, 1 Sep 89 17:11:17 PDT
- Date: Fri, 1 Sep 89 17:11:17 PDT
- From: mgbaker (Mary Gray Baker)
- Message-Id: <8909020011.AA919600@sprite.Berkeley.EDU>
- To: jhh@sprite.Berkeley.EDU, sprite.berkeley:bugs@edu
- Subject: Re: SYS_MAX_ARGS redefined
-
- Oops. My fault. I thought I'd privately defined that one in machConst.h and
- moved it to sysSysCall.h when I started using the ASM stuff. I'll fix it.
-
-
- 419.
- Date: Fri, 1 Sep 89 17:22:48 PDT
- From: brent (Brent Welch)
- Subject: tx killed, csh -i looped
-
- 2 bugs - I killed tx by running my error stress test for
- the read system call. I passed a bad pointer for
- the read buffer, tx got an error from the pseudo-device
- code, and exited. I understand how to make this
- better - currently the code can't tell if the pseudo-device's
- request buffer is bad, or the user has a bad buffer;
- the cross-address space copy just gets a fault and it
- doesn't know who caused the problem. I can fix this by
- added extra code to determine what buffer is bad after
- the error occurs.
- bug 2 - the csh -i child process of the tx that paniced when into
- an infinite loop. I'm not sure what it was doing, but
- I imagine that this is repeatable.
-
- Repeat by:
- cd /sprite/src/benchmarks/read
- read -e
- (while running in a tx window, of course)
-
-
- 420.
- Date: Fri, 1 Sep 89 18:56:33 PDT
- From: douglis (Fred Douglis)
- Subject: kamikaze l1 key
-
- i accidentally hit l1-h instead of l1-k, or something like that. at least,
- the debugger said i was in the state i'd be in had i hit l1-h.
- unfortunately, that state was "in the debugger with a bus error exception"...
- looks like the routine to dump name hash stats needs to be a little
- more careful.
-
- this was repeatable on a sun3 after i killed a ds3100. kids, don't
- try this at home.
-
-
- 421.
- Date: Tue, 5 Sep 89 13:35:04 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: X server on ds3100 dies
-
- My X server frequently dies on the ds3100. It typically happens right after
- I start it. Sometimes my xclock window starts out huge -- the server always
- dies after this happens, but it also dies if it doesn't happen. The
- error message always is:
-
- X Error: request length incorrect; internal Xlib error
- Request Major code 74
- Request Minor code
- ResourceID 0x200040
- Error Serial #409
- Current Serial #422
-
- This is rather annoying since I have to kill and restart the ipServer each
- time and it takes more attempts to get the X server to stay up than I
- have patience for.
-
- I have been running kernel version 1.010 and some of my own kernels which
- use the uninstalled sources.
-
- I don't see any hope of fixing this bug since we don't have the sources,
- but I thought I'd get it recorded for posterity anyway.
-
-
- 422.
- Date: Tue, 5 Sep 89 15:16:54 PDT
- From: pmchen (Peter M. Chen)
- Subject: eqn differing behavior for sprite and unix
-
- One of my ditroff files prints out correctly under unix and not under sprite.
- The difference that I noticed is fractions have overlap between numerator and
- denominator (under sprite).
-
- The example file is in sprite:~pmchen/amdahl/sigmetrics/paper[12]. This should
- be the same as unix:~pmchen/sig/sigmetrics/paper[12]. To format the file,
-
- cd to ~pmchen/amdahl/sigmetrics (or ~pmchen/sig/sigmetrics)
-
- tbl -Ppulla paper* | grn %lw | eqn | ditroff -me %lw -h
-
- One of the example differences is on page 7, 5 text lines down from the top
- of the page. (N-1/N)
-
-
-
- 423.
- Date: Tue, 05 Sep 89 15:20:01 PDT
- From: Fred Douglis <douglis>
- Subject: problems with IP
-
- I wasn't able to log in to various unix machines -- for example, I
- could talk to ginger but not dill or rosemary. I then found I
- couldn't log into mint either, though migrating onto it showed that
- its ipServer was alive. It turned out there was a finger in the
- debugger and a bootp in an infinite loop. when i killed them off (I
- couldn't debug using migration), I could get arp responses and
- kvetching could now talk to other hosts, but I still couldn't log into
- mint. I then noticed that someone was in the process of running
- "restartservers" on mint, and that portmap was now in the debugger. I
- take it someone else walked over to mint to restart stuff.
-
-
- 424.
- Date: Wed, 06 Sep 89 11:07:49 PDT
- From: Fred Douglis <douglis>
- Subject: sld bug
-
- I tried to reinstall a new pmake without the debugging files, but the
- spur version wouldn't link. sld complained about the -mspur flag.
-
-
-
- 425.
- Date: Wed, 06 Sep 89 11:50:53 PDT
- From: Fred Douglis <douglis>
- Subject: another kiss of death
-
- paprika migrated onto, and killed, three hosts in parallel. fenugreek
- died with a "stack format error" exception. i'm checking mace now.
- what's more, paprika is acting strangely -- mary tried using a tx "set
- termcap" menu entry and it produced garbage. I wasn't able to find
- out too much on fenugreek, and am inclined to file this report and
- leave the problem alone unless it repeats.
-
-
-
- 426.
- Date: Wed, 06 Sep 89 12:29:41 PDT
- From: Fred Douglis <douglis>
- Subject: migration problem resolved: floating point problem?
-
- The problem from before happened during pmakes but not during explicit
- migrations using mig. Also, it happened just after i installed a new
- sun3 pmake, though I hadn't thought about that when the problem arose.
- i backed out pmake. must have something to do with programs that use
- hardware floating point.
-
-
-
- 427.
- Date: Wed, 6 Sep 89 14:44:54 PDT
- From: pmchen (Peter M. Chen)
- Subject: mustard crashed hard
-
- i was compiling a program in ~pmchen/raid
-
- cc -g -o multnew multnew.c -lm
-
- Message was:
-
- Exception 34 format at 0E007314
-
-
-
- 428.
- Date: Wed, 06 Sep 89 14:52:53 PDT
- From: Fred Douglis <douglis>
- Subject: Re: mustard crashed hard
-
- the sun3 cc was just reinstalled last night. were you doing the cc by
- hand or using pmake, which might have been doing it remotely? even if
- pmake didn't use the hardware floating point, if cc got migrated away
- and then evicted, it could have crashed your machine.
-
- i see you rebooted mustard. next time this happens, please try to
- login elsewhere, or call, to report the bug and give people a chance
- to look into the crash with the debugger. it's hard to diagnose after
- the fact.
-
-
-
- 429.
- Date: Thu, 7 Sep 89 10:14:52 PDT
- From: ouster (John Ousterhout)
- Subject: Mail return address
-
- Mail from us is still going out with a return address of
- "ouster%sprite.Berkeley.EDU@ginger.Berkeley.EDU" instead of
- just "ouster@sprite.Berkeley.edu". Won't the shorter form
- work OK (I've used it from WRL, for example)? If it works,
- can we change sendmail to use it?
-
-
-
- 430.
- Date: Thu, 7 Sep 89 10:31:22 PDT
- From: brent (Brent Welch)
- Subject: redirection bug?
-
- The following sequence of commands:
-
- rdate %timeServer > /dev/null &
- echo `date` `sysstat -v|sed -e 's/^Kernel.*1\.0 //' -e 's/) (/ /'` >! /hosts/%host/boottime
- cat /hosts/%host/boottime >> /hosts/%host/boottimes
-
- Occasionally puts more into the boottime file that expected:
- >>>>
-
- [1] Done rdate mint.Berkeley.EDU > /dev/null
- Thu Sep 7 02:22:38 PDT 1989 sage SPRITE VERSION 1.010 (sun3 30 Aug 89 17:20:32)
- <<<<
-
- There is an extra linefeed (^M) and the job control message,
- as well as the date and kernel stamp generated by the echo.
- This may well be a bug in csh, for all I know. But the
- csh output regarding the job gets sucked into the standard
- output stream of the next command.
-
-
-
- 431.
- Date: Thu, 07 Sep 89 12:11:07 PDT
- From: Fred Douglis <douglis>
- Subject: migration deadlock
-
- paprika wedged last night, and it only came back to life when it
- panicked with a full process queue. Turns out it did an open of
- /user1, which waited for recovery, and then deadlocked on the process
- itself because the process was locked during the open. I'll change
- it. I'm surprised this didn't bite us before (or maybe it did and we
- just didn't know it).
-
-
-
- 432.
- Date: Thu, 7 Sep 89 13:55:45 PDT
- From: douglis (Fred Douglis)
- Subject: allspice rpc wedge
-
- allspice stopped responding to RPCs. It could ping other hosts but they
- couldn't ping it. When I rebooted, I got a bunch of quick messages
- about hosts doing recovery, which implies that the act of shutting down
- killed something that was locking things up. An rpcstat -srvr showed
- a bunch of wait channels plus a consistently busy channel, with thyme doing
- a remove.
-
-
-
- 433.
- Date: Thu, 7 Sep 89 16:43:34 PDT
- From: shirriff (Ken Shirriff)
- Subject: ipServer problem on mint
-
- The ipServer went into the debugger with a bus error in CallTimeoutHandler
- line 806. I couldn't find the source files to debug further.
-
-
-
- 434.
- Date: Thu, 7 Sep 89 18:32:48 PDT
- From: pmchen@basil.berkeley.edu (Peter M. Chen)
- Subject: mustard crashed
-
- Message was: Entering debugger with a Bus Error exception at PC 0xe06798c
-
- Message in the syslog window was Fsdm_DomainFetch, bad domain number <341>
-
- I called Bob about it, he's looking into it. I need to reboot soon, so
- I'll do that when he's done.
-
-
-
- 435.
- Date: Fri, 08 Sep 89 10:59:31 PDT
- From: Fred Douglis <douglis>
- Subject: need new migration version for sun3s
-
- The recent change to the machine state caused an incompatibility
- between kernels. I am going to change migration to pass the size of
- key structures, such as Mach_UserState, to catch this sort of thing in
- the future. In any case, we need to build new kernels with a
- different migration version. (I think Bob may have been testing new
- kernels with a different version, but when I built my kernel with the
- uninstalled mach the other day I didn't know to do that.)
-
-
-
- 436.
- Date: Fri, 8 Sep 89 13:13:31 PDT
- From: eklee (Edward K. Lee)
- Subject: pmake could not find non-local include files
-
- I generated a Makefile after specifying non-local include directories via
- CFLAGS += -I../sim in a local.mk file.
- mkmf was able to find the non-local include files but when I tries to run
- pmake it complained that it does not know how to make the non-local include
- files.
-
- The program I tries to compile is in ~eklee/raid.sim.
-
-
- 437.
- Date: Fri, 8 Sep 89 14:38:41 PDT
- From: shirriff (Ken Shirriff)
- Subject: ipServer bug
-
- The ipServer crashed on me again. The problem is timeoutList has
- the list pointer values (1,1) which give a seg fault. I suspect
- that memory is getting overwritten somewhere and is clobbering
- timeoutList, but I couldn't figure out where this was happening.
-
-
-
- 438.
- Date: Fri, 8 Sep 89 17:01:11 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: printer problem
-
- I tried printing out some files, and when they seemed to be taking a long
- time I checked the queue. It said it was waiting for paprika to come up.
- Paprika had been in the debugger for a long time, so I rebooted it. When
- paprika came up, nothing printed and the queue said it was empty.
-
-
-
- 439.
- Date: Fri, 8 Sep 89 17:37:44 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: sethostid broken
-
- sethostid is an ultrix binary that is used by the ds3100's during
- boot. Assault will not boot properly because sethostid dies with
- a bus error. I was trying to boot ds3100.new. Sethostid works on
- hijack, so there is something special about assault.
-
-
-
- 440.
- Date: Mon, 18 Sep 89 12:18:05 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 pagein at interrupt level
-
- Well, I've hit a new bug for the ds3100, though it could explain other
- problems; who knows? kvetching died with an "interrupt" exception.
- Its pc was in Mach_EnableIntr at the point where it returns after
- enabling interrupts. It was in an RPC page read at the time, with a
- backtrace going all the way up to:
-
- 20 Vm_PageIn(virtAddr = 0x10005000, protFault = 0) ["vmPage.c":1523, 0x800cd114]
- 21 .block544 ["jhh.md/vmPmax.c":1465, 0x800d1dec]
- 22 VmMach_TLBFault(virtAddr = 0x10005000) ["jhh.md/vmPmax.c":1465, 0x800d1dec]
- 23 .block13 ["jhh.md/machCode.c":1022, 0x80034644]
- 24 MachKernelExceptionHandler(statusReg = 64560, causeReg = 805314572, badVaddr = 0x10005000, pc = 0x800d2ffc = "") ["jhh.md/machCode.c":1022, 0x80034644]
- 25 Mach_KernGenException(0x800fa298, 0x34, 0xc0109234, 0x2, 0xc054ff54) ["jhh.md/machAsm.s":506, 0x80032854]
- 26 Vm_MachDumpTLB(0x800fa298, 0x34, 0xc0109234, 0x2, 0xc054ff54)
- ["jhh.md/vmPmaxAsm.s":719, 0x800d2ff8]
-
- IdleLoop looks like it was trying to panic, because the interrupt
- nesting wasn't 0, but the check I put in Interrupt beat it to it.
- Unfortunately, it never made it to the screen (maybe got buffered for
- my syslog instead), so I didn't know what was going on -- Interrupt
- used printf instead of panic. Anyway, how do we keep the whole page
- in from being done at interrupt level? Should it be done in the first
- place if it's because of a TLB flush?
-
-
-
- 441.
- Date: Mon, 18 Sep 89 22:31:44 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: rawstat in the debugger
-
- Many invocations of the program rawstat seem to pile up in the debugger on
- anise, and tonight when dumping the process table on mint, I noticed rawstat
- was in the debugger there as well.
-
-
-
- 442.
- Date: Tue, 19 Sep 89 11:35:11 PDT
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: mint rpc wedge
-
- mint wedged during recovery again. This time I got into the debugger and
- poked around. I found a ton of rpc servers waiting on their BUSY flag
- and the rpc daemon doing an RpcProbe to sage. Is it possible that
- the daemon is ignoring other RPCs while its RpcProbe is taking place,
- or something? Anyway, when I wasn't able to find a definite
- cause of the problem, I continued mint, and this time people seemed
- to recover okay. (an aside: assault was shutdown during the interim, so
- if there's anything going on relating to the number of hosts recovering
- simultaneously, that might be relevant.)
-
- also, sage must be wedged itself. it recovered with mint only when i typed
- at its console, and even then, i wasn't able to start "ping" when i
- tried to ping mint from sage. nor does sage respond to pings, or let
- me ^C out of the ping. I'm debugging it now.
-
-
-
- 443.
- Date: Tue, 19 Sep 89 12:45:24 PDT
- From: Fred Douglis <douglis>
- Subject: cc bug: rpn won't compile
-
- I installed a new rpn, with a patch from Andy that fixes the hex
- display problem for large numbers. However, when I tried to recompile
- for the sun3 to make sure I didn't break anything, I found that cc
- hits a bus error trying to compile src/main.c. Any cc guru care to
- take a look?
-
-
-
- 444.
- Date: Tue, 19 Sep 89 14:10:22 PDT
- From: Fred Douglis <douglis>
- Subject: disk library won't compile
-
- c/disk no longer compiles -- complains that kernel/fsDisk.h no longer
- exists. I looked for kernel/fs*Disk but couldn't find a renamed
- version. What gives?
-
-
-
- 445.
- Date: Tue, 19 Sep 89 15:36:02 PDT
- From: brent (Brent Welch)
- Subject: Re: fs header file changes
-
- fsDisk.h is now fsdm.h. I recently moved all the old versions
- of fs header files in Include to a different place so old code
- that should be fixed won't compile.
-
-
-
- 446.
- Date: Tue, 19 Sep 89 15:57:15 PDT
- From: pmchen (Peter M. Chen)
- Subject: latest crash on raid
-
- Fatl Error: VmMach_DMAAlloc: unable to satisfy request for 65536 bytes at
- 0xf655c8b8
-
- This was whe 8 processes each requested 64KB. The kernel was sun4.md/mgbaker
-
- ~pmchen/raid/mult/ex1 /dev/rsvj1 1
-
- (I was in ~pmchen/raid/mult)
-
-
-
- 447.
- Date: Tue, 19 Sep 89 15:54:46 PDT
- From: shirriff (Ken Shirriff)
- Subject: rcp from decstation hangs
-
- I tried to copy a kernel from pride (decstation) to dill and the first
- time it stopped after copying 106496 bytes and the second time it
- stopped after copying 270336 bytes. By stopping I mean the rpc command
- sat there for several minutes and then gave rpc: lost connection.
- I tried the copy from nutmeg (sun3) and all 1929348 bytes were copied
- without problem.
-
-
-
- 448.
- Date: Tue, 19 Sep 89 18:18:39 PDT
- From: brent (Brent Welch)
- Subject: MACH_EXC_BUS_ERR_LD_ST panic
-
- Apathy crashed on Garth with a panic from MachUserExceptionHandler.
- It got a fault 'cause' of MACH_EXC_BUS_ERR_LD_ST, and panic'd
- with a message: "User bus error on ld or st". Why is this a panic?
-
-
-
- 449.
- Date: Wed, 20 Sep 89 10:17:34 PDT
- From: gibson (Garth Gibson)
- Subject: what are these "LE ethernet: Received packet with CRC error." messages?
-
- I've seen them shortly after starting X11 on apathy and just now shortly after
- login to pepper (both ds3100s). pepper runs FD.029 (CLEANds3100) (19 Sep 89)
-
-
-
- 450.
- Date: Wed, 20 Sep 89 15:59:24 PDT
- From: Fred Douglis <douglis>
- Subject: mkmf change and bug fix
-
- The implementation of mkmf was different from the documentation. The
- documentation claims that if ./mkmf.local exists, it will be used, but
- the program actually looked for ./mkmf -- which is a mistake since if
- someone has "." in the path before /sprite/cmds, they'll invoke the
- script without the proper environment variables. I've changed mkmf.
- If anyone was relying on the broken behavior, and had a "mkmf" script
- instead of "mkmf.local", they should change it.
-
-
-
- 451.
- Date: Thu, 21 Sep 89 12:18:19 PDT
- From: pmchen (Peter M. Chen)
- Subject: raid crash
-
- I crashed raid by running 16 concurrent processes, each asking for 512 bytes.
- Actually, I think only 6 of them got started running. Only 6 * 512 bytes should
- easily fit in the DVMA space, yes? Nothing came out on the /dev/syslog, and
- I'm not at the console to look, but I'll ask Ken (or whoever) to look at the
- console when he gets in and send you the message.
-
- Ed remembered that the requests are aligned on some large boundary (128K?)
- to avoid some of the cache flushing problems. What happens if alignment
- is not possible?
-
-
-
- 452.
- Date: Wed, 20 Sep 89 16:31:17 PDT
- From: Fred Douglis <douglis>
- Subject: mkmf bigcmdtop bug
-
- If you say mkmf at the top level before running mkmf in the
- subdirectories, it tries to make depend and complains that */Makefile
- doesn't exist.
-
-
-
- 453.
- Date: Wed, 20 Sep 89 16:46:56 PDT
- From: brent (Brent Welch)
- Subject: hung gdb
-
- There is a hung gdb process on sage. I quit gdb while the
- program was at a breakpoint. The program was not continued
- by gdb, and it hung. I'm leaving it in the current state,
- and I'm even willing to let someone debug sage (ask first!)
- if they need to. I was able to suspend gdb and put it
- into the background, and I could probably kill it too.
- However, it shouldn't behave this way so it would be great
- if someone took a look at it.
-
-
-
- 454.
- Date: Thu, 21 Sep 89 09:46:17 PDT
- From: ouster (John Ousterhout)
- Subject: Another trashed file
-
- The file ~ouster/162/notes/t05 has become corrupted sometime
- between January 6 and today: the end of the file is a bunch
- of control characters (perhaps some machine code?) preceded by
- the following characters:
-
- openOpen file %s
- lseekreadError 0x%x from Proc_SetPriority
- seek time %4d.%-03d
- seek and read to 0x%x time %4d.%-03d
-
- I moved this file to /user1/trashed.
-
-
-
- 455.
- Date: Thu, 21 Sep 89 11:02:51 PDT
- From: Fred Douglis <douglis>
- Subject: cc1.68k optimization bug
-
- cc1.68k goes into the debugger trying to compile
- /a/newcmds/ixgraph/src/xgraph.c. This file compiles okay on the
- ds3100 and also compiles okay when optimization is disabled.
-
-
-
- 456.
- Date: Thu, 21 Sep 89 12:08:01 PDT
- From: brent (Brent Welch)
- Subject: recovery trashed file
-
- I caught a file getting corrupted after recovery.
- I was generating data to a file when oregano crashed.
- The last block ended up having data from a temporary .s file.
- There was 2640 bytes in the 4th block, and they were
- all from the wrong file. I suspect that the output file was
- caught in the middle of growing a fragment (from 2K to 3K)
- and the cache didn't get written out properly when
- Oregano crashed. I'm pretty sure the file was not being
- cached on the clients because I was generating it at
- sloth and I had just looked at it on sage. I'll go
- scan the cache code to see if UpgradeFragment is vulnerable.
- brent
- ps. Oregano crashed with the known bug in (sun3) 1.022
- hmm... the bug only happens when the cache is full too.
- the plot thickens.
-
-
- 457.
- Date: Thu, 21 Sep 89 12:11:11 PDT
- From: pmchen (Peter M. Chen)
- Subject: oregano crash--netroute
-
- After the oregano crash this morning, I had to manually run
- netroute -s -f /etc/spritehosts
-
- in order to have raid know about oregano. Can this be put in oregano's
- bootup script?
-
-
-
- 458.
- Date: Thu, 21 Sep 89 13:47:14 PDT
- From: brent (Brent Welch)
- Subject: Re: recovery trashed file
-
- I'm pretty sure my hunch is right. UpgradeFragment
- is in charge of finding a larger fragment for a cache block
- that is growing in size from 1K to 2K, 2K to 3K, etc.
- It does this by fetching the cache block containing the
- previous version of the fragment (allocation happens
- before the write), changing the file descriptor's
- indexing structure, and then unlocking the cache block
- while assigning it to the new disk location.
- The order of these last two steps is wrong, I think,
- especially because the operation that shifts the
- cache blocks disk address puts it on the dirty list,
- but it might wait if the old version of the block
- is undergoing I/O. Thus, the scenario of Oregano's
- crash (due to a stupid coding mistake of mine that only showed
- up when the cache was full...) is that the file descriptor
- was modified to refer to the new location, but the
- cache block was held up, and it never got onto the
- dirty list (again) with a new disk address associated with it.
- Et voila, when Oregano rebooted the file descriptor
- referenced the wrong fragment.
-
- I've simply re-ordered the operations in UpgradeFrament
- so it unlocks the cache block first, and then updates
- the file descriptor. Thus the worst case is that
- the cache block gets sucessully re-assigned to a new
- block, but the file descriptor doesn't get updated.
- Oh, it is already true that the old fragment is
- free'd at the very end, and that seem's ok.
-
- The fix for this is in fsdm, and I've got a new sun3.md/brent
- kernel that has this fix, plus a fix in fscache that
- caused Oregano to crash in the first place. All hosts
- that run the newly installed .new kernel are vulnerable
- to the Bus Error causing bug that is now fixed in fscache.
- I'll probably make a new .new kernel with the fix.
-
-
-
- 459.
- Date: Thu, 21 Sep 89 19:53:12 PDT
- From: Fred Douglis <douglis>
- Subject: Re: MACH_EXC_BUS_ERR_LD_ST panic
-
- Apathy crashed on Garth with a panic from MachUserExceptionHandler.
- It got a fault 'cause' of MACH_EXC_BUS_ERR_LD_ST, and panic'd
- with a message: "User bus error on ld or st". Why is this a panic?
-
- The uninstalled mach now kills the user process instead, since I
- couldn't see any reason for the panic either. Don't think this has
- made it into a new kernel yet, though.
-
-
-
- 460.
- Date: Thu, 21 Sep 89 18:02:24 PDT
- From: arc%sgi.sgi.com@sgi.sgi.com (Andrew Cherenson)
- Subject: rcsinfo/rcstell missing?
-
- On allspice, rcsinfo & rcstell are missing from /sprite/cmds.
-
-
-
- 461.
- Date: Sat, 23 Sep 89 23:57:33 PDT
- From: tve (Thorsten von Eicken)
- Subject: help: allspice:/mic seems pretty corrupted
-
- I get directories which contain pieces of files and an fscheck
- (I did: fscheck -dev rsd10 -part c)
- shows tons of "File nnnnn contains duplicate block nnnnn.".
- HELP! Can someone see what's bad?
-
- I suppose the disk will have to be reinitialized... please try to keep
- "/mic/tve" (except for /mic/tve/src/ftp, which is also corrupted...)
-
- Thanks,
- Thorsten
- NB: is there a way to thoroughly test the disk?
-
-
-
- 462.
- Date: Sat, 23 Sep 89 17:26:55 PDT
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: mint crash
-
- Mint died in FslclLookup because a handle wasn't locked. I am logged in from
- home so I can't debug too well (no scrollbars :)... but I did see that the
- name it was trying to open was "./../" if that means anything. Do we
- have kgcore on unix? Might be nice to have if not....
-
-
-
- 463.
- Date: Sun, 24 Sep 89 19:10:46 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Re: help: allspice:/mic seems pretty corrupted
-
- This is the same problem we saw before when Martha Zimet first copied
- a bunch of new files onto allspice. Are the files actually corrupted,
- or did you just get a ton of messages from fscheck? If I remember
- correctly, last time no action was necessary because the files weren't
- actually corrupted. Some count just wasn't correct and fscheck thought
- things were unhappy.
-
-
-
- 464.
- Date: Mon, 25 Sep 89 08:26:37 PDT
- From: ouster (John Ousterhout)
- Subject: Re: /dev/tty bug (was Re: anonymous ftp problem)
-
- Removing a bogus /dev/tty is good for now, but I suspect that
- it's there because there's a program around somewhere that opens
- /dev/tty in create mode. If this hunch is right, /dev/tty is
- going to keep re-appearing until we find the program and change
- it not to create /dev/tty.
-
-
-
- 465.
- Date: Mon, 25 Sep 89 09:08:27 PDT
- From: rab (Robert A. Bruce)
- Subject: piquante
-
- Piquante is in the debugger with a coprocessor unusable exception.
-
- MachKernelExceptionHandler: Coprocessor unusable
- Entering debugger with a Coprocessor unusable exception at PC 0x800c108c
-
-
-
- 466.
- Date: Mon, 25 Sep 89 14:25:09 PDT
- From: ouster (John Ousterhout)
- Subject: Sendmail died
-
- Sendmail went into the debugger on Mace. Anyone interested in
- looking at it? I'm leaving the corpse around.
-
-
-
- 467.
- Date: Mon, 25 Sep 89 17:05:32 PDT
- From: Fred Douglis <douglis>
- Subject: update for ds3100
-
- ds3100 update is an old binary. it won't compile under sprite (the installed
- version must have come from WRL), and it doesn't work running on a
- ds3100 for ds3100-based files. "update ~brent/postrawstats ~/..."
- created a directory but didn't copy any files. running on a sun3
- worked fine.
-
- the problems relate to N_TXTOFF and similar incompatibilities.
-
-
-
-
- 468.
- Date: Mon, 25 Sep 89 20:07:36 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: something funny with recovery
-
- I came back from aerobics once again to find the machines in a strange state.
- It appeared that allspice had been rebooted twice. The first time, fenugreek
- went through recovery. Then, according to fenugreek's syslog, there was a hung
- rpc echo to allspice. Then allspice rebooted again, but fenugreek didn't get
- recovery.
-
- I went up to allspice, and it thought it was quite happy. I rpc ping'd
- fenugreek and some other machines, and they responded. After about 5 minutes,
- and a few ls's and such, all of a sudden a whole bunch of machines went
- through recovery, including fenugreek. But fenugreek's window system was
- still frozen. I finally rebooted fenugreek with the new kernel.
-
-
-
- 469.
- Date: Tue, 26 Sep 89 07:12:45 PDT
- From: brent (Brent Welch)
- Subject: Re: something funny with recovery
-
- Two things. First, Allspice crashed with a "non-aligned" read.
- It printed a message about a 1024 byte read at about 16K and
- then hung. This happened while I was rebooting mint with the
- new .new kernel last night. Being in a hurry I just tried to
- reboot allspice, and then realized I hadn't installed dev,
- so it didn't see its new disk. I left allspice in single-user
- mode while I installed dev using mint. I then rebooted allspice.
- Anyway, that fenugreek didn't recover correctly is still a bug.
- There have been a few cases recently where machines don't seem
- to be pinging a server, so I'll look into it. Most machines
- seemed to recover ok. It was rather stressful on the system
- because I rebooted assault, then mint, then allspice. Perhaps
- a pagefault was waiting on recovery and somehow blocked enough
- things to prevent pinging. If both page faults and pinging
- are handled with Proc_CallFunc(), then this may be the problem.
- The proc_ServerProc's may be all used up waiting to page something in.
-
-
-
- 470.
- Date: Tue, 26 Sep 89 10:11:23 PDT
- From: Fred Douglis <douglis>
- Subject: allspice chucked my files
-
- Something very strange happened yesterday. My directory apparently
- got reverted to an older version. I had done an update from one
- directory into another, on /user1. I edited in that directory for an
- hour or two and then left. Allspice rebooted various times. This
- morning, my files were all as they were before I'd edited them, and
- the backup copies created by emacs didn't exist. My interpretation of
- this is that the directory was somehow reverted, so the inodes in the
- directory that pointed to backup versions were still valid under their
- original names. The one file I'd edited on two different machines was
- intact, however. That is, I did a lot of work on kvetching, then some
- work in the same directory a moment later on paprika, then eventually
- back to kvetching.
-
- I recommend that people look through /user1/lost+found to make sure
- nothing of theirs is missing.
-
-
-
- 471.
- Date: Tue, 26 Sep 89 11:08:37 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: update won't compile
-
- Update will not compile for the ds3100's. The existing update does not
- work right when run on a ds3100 when the files are on a ds3100 file server.
- I think the problems are due to Mike's changes to the a.out.h macros
- (N_TXTOFF and others). I will add this fix to my queue but want it
- recorded in case I forget.
-
-
-
- 472.
- Date: Tue, 26 Sep 89 12:07:28 PDT
- From: Fred Douglis <douglis>
- Subject: vm recovery problem
-
- i've started to get a clue about why various machines are wedging
- after allspice reboots. paprika was also wedged this morning. when i
- debugged it, i found a lot of processes waiting on the vm monitor
- lock, but the lock wasn't held. i poked around but couldn't find an
- explanation, so i finally continued the machine. surprisingly, it
- came out of its stupor, but only enough to complain about I/O errors
- in Fs_Dispatch, failed recovery with allspice, and finally a negative
- reference count on closing the swap file for one of the processes
- that bought it on a page-in error.
-
-
-
- 473.
- Date: Wed, 27 Sep 89 15:30:43 PDT
- From: Fred Douglis <douglis>
- Subject: restarting system calls from migration
-
- The migration database got locked again, and this time I was able to
- poke around while it was still locked. Turns out what's happening is
- a result of a change I made a few weeks ago to try to make migration
- transparent. Just as Fs_Read is really a C routine that makes a
- system call in a loop in case of interrupts, Fs_IOControl was changed
- to do this as well. That's because there were programs that would die
- because they got migrated during an ioctl and they got back an EINTR
- result they weren't expecting. On the other hand, it turns out that
- retrying ioctls that one would normally expect to abort (because of a
- real signal rather than a migration pseudo-signal) causes problems.
- For example, loadavg ends up retrying a blocking flock even after its
- alarm goes off, thereby sleeping forever.
-
- So, what to do? I suppose there's no easy way for user-level routines
- to find out what signal caused a system call to abort. I could
- special case migration by returning a different return status, which
- would be a pain, or I could add a system call to determine the last
- signal delivered, which would also be a pain. Any better ideas? This
- sort of issue has come up before, with respect to things like sigpause
- (a process blocks everything and thinks nothing that it can live
- through can cause it to get signalled -- KILL & such would blow it
- away -- and then migration causes it to get signalled. In that case,
- I could see what signal was pending for it and return a
- GEN_ABORTED_BY_SIGNAL when the only signal was migration-related; then
- the user-level routine would know to try again. A more general
- solution would certainly be preferable.
-
-
-
- 474.
- Date: Tue, 26 Sep 89 18:23:44 PDT
- From: brent (Brent Welch)
- Subject: Blocks => sector mapping broken
-
- The mapping from blocks to sectors is broken with the
- -scsi option to fscheck. It turns out that the mapping
- from file system blocks to disk sectors
- assumes that "rotational sets" completely take up a whole
- number of tracks. With the -scsi option to fscheck this isn't true,
- so the calculation of the 'firstSector' variable in
- the DiskBlock I/O routines is broken. We were just lucky
- with the other disks, and we weren't lucky with this one.
- With different geometries the bug will either overlap
- the rotational sets or it will separate them by some
- sectors. Obviously we are overlapping them in this case.
- (rotational sets are groupings of blocks where each block
- has a different rotational offset. The idea is/was to pack
- blocks onto sectors and get a skewed location between blokcs
- on different tracks, sort of like a brick wall where the
- ends of bricks on different layers don't line up. Each cylinder
- is divided into a number of rotational sets.)
-
- Here is the (broken) mapping:
- firstSector = geoPtr->sectorsPerTrack * geoPtr->numHeads *
- cylinder +
- /* wrong */ geoPtr->sectorsPerTrack * geoPtr->tracksPerRotSet *
- rotationalSet +
- geoPtr->blockOffset[blockNumber];
-
- I'm not sure of the best way to fix this. Adding a
- sectorsPerRotSet to the Fs_Geometry structure would be best.
- However, this will be painful because the Fs_Geometry is written on
- the disk. We could write a utility that munges our headers
- to conform to a new Fs_Geometry structure, but that sounds
- rather exciting. Alternatively we could pitch the notion of
- rotational sets altogether, but again we have the problem of
- all our current disks built on the old mapping. Another approach
- would be to detect this situation and use a different mapping.
- The bad situation occurs when
- geoPtr->sectorsPerTrack * geoPtr->tracksPerRotSet <
- DISK_SECTORS_PER_BLOCK * geoPtr->blocksPerRotSet
-
- For example, the Wren IV disks on Oregano have:
- sectorsPerTrack 46
- blocksPerRotSet 17
- tracksPerRotSet 3
- tracksPerCyl 9
-
- each RotSet is allocated 46 * 3 sectors, or 138,
- and 17 blocks takes up 8 * 17 sectors, or 136.
- So there are two (wasted) sectors after each rotational set, 6 wasted in all
-
- However, with the Wren VI disk on Allspice:
- sectorsPerTrack 53
- blocksPerRotSet 11
- tracksPerRotSet 1 (!!!)
- tracksPerCyl 15
-
- each RotSet is allocated 53 * 1 sectors, or 53
- but 11 blocks takes up 88 sectors....
-
- Finally, if the -noscsi option to fsmake is specified then my
- original logic will correctly fit the rotational sets onto
- whole numbers of tracks, but there might be more wasted sectors.
- I can't log into Allspice to see how much would be wasted because
- its login is in the debugger, and all attempts to rlogin suffer
- the same fate.
-
-
-
- 475.
- Date: Tue, 26 Sep 89 21:21:43 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: gdb on sun4 broken
-
- I was running gdb on allspice and was unable to step after I hit a
- breakpoint. I was running version 2.7? (is this needed anymore?) of
- gdb.
-
-
-
- 476.
- Date: Wed, 27 Sep 89 12:05:00 PDT
- From: pmchen (Peter M. Chen)
- Subject: error from ls
-
- mustard% ls
- *** compat: Cannot decode user status value 0xffffffff
- 262/ cmds/ leslie/ raid/ tmps
- 80col cmds.sun3/ library/ reminders tt*
- News/ conferences/ mail/ simul/ verses/
- amdahl/ dead.article me@ spritereport viv/
- bin/ dead.letter misc/ talks/ writeups/
- c3/ donna/ notes/ tapes/ xtroff/
- calendar info/ perf/ tmp/
-
- I also had problems logging in from envy last night (and it didn't respond
- to pings). If you want to look at it, feel free (I'm going to be gone 'til
- 1:00pm). I'm going to reboot it at 1pm, though.
-
- Here's a look at the syslog:
-
- Broadcasting for server of "/sprite/src/kernel"
- RPC srvr 62c2c
- RPC srvr 62c2e
- Broadcasting for server of "/user2"
- RPC srvr 92c32
- Broadcasting for server of "/spur2"
- RpcDoCall: <stat> RPC to oregano is hung
- <getIOAttr> 9/26/89 20:13:27 lust (1) RPC timed-out
- Fsrmt_GetIOAttr failed <30002>: device <0,0> at server 1
- 9/26/89 22:36:13 anise (49) rebooted
- <stat> RPC exit 0xffffffff
- Broadcasting for server of "/sprite2"
- 9/27/89 10:39:08 lust (1) rebooted
- 9/27/89 10:40:43 anise (49) rebooted
- 9/27/89 11:23:49 kvetching (2) rebooted
- 9/27/89 11:34:55 lust (1) rebooted
-
-
-
- 477.
- Date: Wed, 27 Sep 89 12:10:10 PDT
- From: Fred Douglis <douglis>
- Subject: Re: error from ls
-
- the hung rpc was because oregano's ipServer went into the debugger. I
- killed it and restarted oregano's daemons late last night. When it
- came back, things weren't quite right: I didn't recover /sprite2, and
- I got -1 status values (0xffffffff) for the things in progress at the
- time I killed the ipServer. I then killed and restarted the mount of
- /sprite2 by hand and things worked better. I didn't file a bug report
- on this because it seemed like the same problem Thorsten had recently
- when he couldn't reach an NFS disk, though perhaps this is a different
- problem after all.
-
-
-
- 478.
- Date: Wed, 27 Sep 89 16:25:40 PDT
- From: Fred Douglis <douglis>
- Subject: exec bug: trashing memory
-
- Thorsten repeatedly crashed his machine by accidentally invoking a
- shell script that called itself recursively with more args every time.
- This is on my to-do list, but I wanted to file the bug report to make
- sure I don't lose it and that no one else wastes time tracking down
- the bug.
-
-
-
-
- 479.
- Date: Wed, 27 Sep 89 17:31:44 PDT
- From: pmchen (Peter M. Chen)
- Subject: mail screwed up
-
- I'm having trouble mailing things out (they go out with a null message body).
- The message I was going to send was about pmake errors.
-
-
-
-
- 480.
- Date: Wed, 27 Sep 89 17:32:30 PDT
- From: pmchen (Peter M. Chen)
- Subject: rest of message
-
- FsrmtDeviceMigrate, server error <40012>
- Warning: ProcMigReceiveProcess: error returned by deencapsulation procedure Fs_DeencapFileState:
- the file handle is out of date.
- FsrmtDeviceMigrate, server error <40012>
- Warning: ProcMigReceiveProcess: error returned by deencapsulation procedure Fs_DeencapFileState:
-
- >> are some of the error messages I got. Also, make seems to be hanging. A
- couple hours ago, make didn't return at all (no error messages). Now, it
- gives the following errors:
- "/sprite/lib/pmake/command.mk", line 383: Warning: Malformed conditional (!empty(DISTDIR))
- "Makefile", line 33: #if-less #else
- "/sprite/lib/pmake/command.mk", line 214: Warning: Extra command line for "MAKECMD" ignored
- "/sprite/lib/pmake/command.mk", line 215: Warning: Extra command line for "MAKECMD" ignored
-
- "/sprite/lib/pmake/command.mk", line 392: #if-less #endif
-
-
-
- 481.
- Date: Wed, 27 Sep 89 18:47:59 PDT
- From: shirriff (Ken Shirriff)
- Subject: mint ipServer hangs / gdb is useless
-
- The ipServer on mint went into the debugger again. The stack trace is
- status.go
- CvtFtoA( bunch of junk )
- Mem_PrintStatsInt
- I tried to debug Mem_PrintStatsInt, but every time I tried to examine
- the variable "i", gdb went into the debugger, so I gave up.
- If anyone wants more details, it's on the console.
-
-
-
- 482.
- Date: Thu, 28 Sep 89 10:44:36 PDT
- From: douglis (Fred Douglis)
- Subject: ds3100 bug: mem_free
-
- kvetching crashed hard with a Mem_Free storage block already free -- wouldn't
- respond to the debugger though it said it entered it okay. if anyone else
- sees this please let me know.
-
-
-
-
- 483.
- Date: Thu, 28 Sep 89 11:43:58 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 X status
-
- I couldn't find anything that has changed in the past day or so, but
- nevertheless, X is suddenly broken. However, /ultrix/cmds/Xcfb.new
- works for me though Xcfb does not. Furthermore, its fonts are set up
- ok for the DEC fonts, though not for the MIT-compatible fonts (which
- are in their own directory with a different fonts.dir file that is
- compatible with the old format). Also, Xcfb still isn't giving me
- color.
-
-
-
-
- 484.
- Date: Thu, 28 Sep 89 12:25:12 PDT
- From: gibson (Garth Gibson)
- Subject: "ar" across NFS
-
- on basil (SPRITE VERSION 1.010 (sun3) (30 Aug 89 17:20:32))
- in /spur/gibson/Csim
- I execute "ar q sun3.md/csim.a sun3.md/*.o"
- and it seems to hang (or at least make no real progress)
- for minutes
- if instead I do "ar q ~/csim.a sun3.md/*.o"
- it works nearly instantly
-
- why should ar hang when the object is across NFS ?
-
- Actually, I think it is the "q" argument (quickly append). If instead I do
- "ar r sun3.md/csim.a sun3.md/*.o"
- it runs in about 15 seconds even over NFS
-
-
-
- 485.
- Date: Thu, 28 Sep 89 13:08:14 PDT
- From: douglis (Fred Douglis)
- Subject: xkill kills X?fb.new
-
- I used xkill and got a segmentation violation in Xcfb.new. Since
- we don't have sources, I don't think there's much I can do. Whoopie!
-
-
-
- 486.
- Date: Fri, 29 Sep 89 10:46:29 PDT
- From: ouster (John Ousterhout)
- Subject: Out of space?
-
- I'm getting the following message in my syslog window, over and over:
-
- 9/29/89 10:45:40 allspice (14) RmtFile "mbox" <2,64776> Write-back failed: out of disk space
-
- But when I do a "df" there appears to be plenty of space on /user1.
-
-
-
- 487.
- Date: Fri, 29 Sep 89 11:12:54 PDT
- From: Fred Douglis <douglis>
- Subject: Re: wall
-
- i reported a bug a few weeks ago that there are hung rlogind processes
- that cause opens of /hosts/*/rlogin* to sometimes get hung. the wall
- process never gets past the open. the file system has to handle hung
- pdevs a little better, i guess.
-
- i think as a temporary measure i will change wall to do all the
- syslogs first, then go back and do the rlogin pdev files afterwards.
- maybe eventually it can fork a child that may or may not finish and
- time out, but it would be better to fix the problem in the kernel
- instead.
-
-
-
- 488.
- Date: Fri, 29 Sep 89 15:43:25 PDT
- From: tve (Thorsten von Eicken)
- Subject: /sprite/* aren't group sprite...
-
- It would certainly help if they were...
-
-
-
- 489.
- Date: Fri, 29 Sep 89 16:51:32 PDT
- From: douglis@ginger.berkeley.edu (Fred Douglis)
- Subject: mint deadlock
-
- after allspice wedged and was rebooted, it was mint's turn. no one could
- log in because access to /sprite/admin/lastLog was hung due to cache
- consistency. a single process was actually in the middle of an rpc to
- parsley, but parsley wasn't usable. parsley responded to pings. seems the
- timeout for client cache consistency didn't kick in, or something. brent:
- what happens if a client just decides to hang the call to start the consistency?
- I presume the timeout only starts once the rpc has finished and you're
- awaiting a callback from the client.
-
- parsley is in the debugger and i'll try to poke around once mint comes back,
- assuming i can login to my own machine successfully for a change.
-
-
-
-
- 490.
- Date: Fri, 29 Sep 89 17:44:02 PDT
- From: Fred Douglis <douglis>
- Subject: more on cache callback problems
-
- assault ran into the same problem -- it locked up ~douglis/.emacs. i
- debugged it and found it was in the middle of an rpc to hijack.
- hijack was actually not responding to rpc pings, and ken said it was
- continually printing the same statement to its syslog (this bug goes
- way back, eh?). when hijack rebooted and assault was continued,
- things got back to normal.
-
-
-
- 491.
- Date: Sat, 30 Sep 89 01:18:11 PDT
- From: tve (Thorsten von Eicken)
- Subject: makedepend -p not used in mkmf
-
- Why doesn't mkmf use the "-p" flag of makedepend? I run into trouble with
- that when I run pmake: it complains "Can't figure out how to make foo.h".
-
- The right -Idiretory flag is passed to makedepend.
-
-
-
- 492.
- Date: Sat, 30 Sep 89 01:36:34 PDT
- From: tve (Thorsten von Eicken)
- Subject: mkmf and #define no_install
-
- there should be a note in the man page about the possibility of
- #defining no_install in local.mk
-
-
-
- 493.
- Date: Sat, 30 Sep 89 01:42:23 PDT
- From: tve (Thorsten von Eicken)
- Subject: mkmf and makedepend, where does DEPFLAGS go?
-
- in /sprite/lib/pmake/command.mk is says at the beginning:
- # DEPFLAGS additional flags to pass to makedepend
- but these do not appear where makedepend in actually called. Maybe I'm
- blind (the whole mkmf stuff is pretty complicated...) but I'll try to
- fix it. I'll leave comments with the string "TvE" around so someone
- please check whether I goofed. Thanks,
- -TvE
- NB: anyway, I think the "-p" flag should always be passed to makedepend, I'll
- try to do that with "DEPFLAGS=-p"...
-
-
-
-
- 494.
- Date: Sat, 30 Sep 89 15:27:23 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: wall bug
-
- I rlogin'd to sage just a few moments ago and got the following wall
- from yesterday:
-
- sage<jhh 2> Broadcast message from douglis@kvetching.Berkeley.EDU at 17:30 ...
- time for assault to be debugged. /user2 will be unavailable temporarily....
- Fred x29669
-
-
-
- 495.
- Date: Sat, 30 Sep 89 18:18:24 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: non-existent FsStats referenced in spritemon
-
- Spritemon no longer compiles because it references a structure called FsStats
- which doesn't exist. Was this renamed in the file system renaming?
-
-
-
-
- 496.
- Date: Sun, 1 Oct 89 13:55:14 PDT
- From: ouster (John Ousterhout)
- Subject: Re: makedepend problems
-
- It's fine to use DEPFLAGS in makedepend calls, and it sounds like
- a bug that it wasn't there before. However, it sounds like Thorsten
- may not have done everything necessary to add the usage of DEPFLAGS.
- If DEPFLAGS are used, then they should default to empty to handle
- the normal case where they're not specified. In command.mk there
- is a group of lines that do this for other flags, like XCFLAGS, LINTFLAGS,
- and so on. Perhaps the best solution is to add DEPFLAGS back into
- command.mk, but also add a line
-
- DEPFLAGS ?=
-
- in the group of lines just after the "#include <tm.mk" line.
-
-
-
- 497.
- Date: Sun, 1 Oct 89 15:05:39 PDT
- From: ouster (John Ousterhout)
- Subject: Weird /mic behavior
-
- I noticed strange behavior with respect to /mic today... I'm not
- sure whether this is a bug or not. Mace has an old entry in its
- prefix table from last week when /mic existed on Allspice. At
- present, /mic is dismounted and unavailable (and Allspice has
- rebooted in there at some point too). I tried to cd to /mic, and
- saw two unusual things:
-
- 1. The following messages appeare in my syslog window:
-
- open of "/mic" waiting for recovery
- 10/1/89 14:49:13 allspice (14) RmtFile "/mic" <3,2> : stale handle
- 10/1/89 14:49:13 allspice (14) - recovering handles
- 10/1/89 14:49:13 allspice (14) RmtFile "/mic" <3,2> Reopen failed : domain unavailable
- 10/1/89 14:49:14 allspice (14) Recovery complete 140 handles reopened 10 failed reopens
-
- 2. The csh hung, and I had to kill it.
-
- Perhaps it makes sense for the csh to hang, since it's ostensibly waiting
- for /mic to become available, but I don't see why recovery should get
- invoked. This was repeatable: each time I tried to cd to /mic, recovery
- was invoked.
-
- Then I tried "ls /mic", and something different appeared in my syslog
- window:
- Fsprefix_HandleClose nuking "/mic"
- Broadcasting for server of "/mic"
- <prefix> 10/1/89 15:00:48 broadcast (0) RPC timed-out
- Now this seemed much more reasonable: the ls eventually quit with an
- error "/mic unreadable". At this point, "cd /mic" produced the same
- behavior, so apparently the ls unwedged something inside the kernel.
-
- Does "cd" behave differently than reading a file, and perhaps not invoke
- the right level of recovery actions?
- -John-
-
-
- 498.
- Date: Sun, 1 Oct 89 16:24:16 PDT
- From: ouster (John Ousterhout)
- Subject: Re: /sprite/lib/include/command.mk clears .PATH.h
-
- I forget the exact reason why the system .mk files clear .PATH.h,
- but I'm pretty sure it's necessary. I believe that it has to be
- done to guarantee a particular ordering of the include files, but
- it's been a long time since I've thought about this. You're
- right that it makes things tricky for local.mk files.... sigh.
- Some things in the local.mk have to be done BEFORE including
- the SYSMAKEFILE, and some things (like adding to .PATH.h) have
- to be done afterwards. It would probably be better to re-arrange
- the Makefiles some day so everything happens either before or
- after including the SYSMAKEFILE. As you've noticed, many of the
- Makefile features also aren't documented very well (they've
- gradually accreted over time). I wish there were a simpler way for
- all of this, (but given the complicated set of things we want the
- Makefiles to handle, I'm not sure there is).
-
-
-
-
- 499.
- Date: Sun, 1 Oct 89 17:23:26 PDT
- From: shirriff (Ken Shirriff)
- Subject: kgdb.sun4 is strange
-
- The editing controls no longer work correctly for kgdb.sun4. Backspace
- now does some strange nondestructive cursor motion function instead of
- performing the normal backspace function.
-
-
-
- 500.
- Date: Sun, 1 Oct 89 22:16:11 PDT
- From: douglis (Fred Douglis)
- Subject: rpcecho/rpccmd -ping
-
- rpcecho -h pride -d 16384 -n 1000
- Rpc Send Test: N = 1000, Host = pride (6), size = 16384
- N = 1000, Size = 16384, Time = 0.039671
-
- rpccmd -ping pride -b 16384
- Send 16384 bytes 0.020078 sec
-
- I assume the echo is bouncing the entire packet back again, huh?
- but the one-way ping doesn't have the same flexibility for repeating the
- test a variable number of times, etc. since these two programs do
- different things even though they look so similar, perhaps the
- documentation should be clearer? ("rpc send test => rpc bounce test" or
- something?)
-
-
-
-
- 501.
- Date: Sun, 1 Oct 89 23:39:48 PDT
- From: douglis (Fred Douglis)
- Subject: tx/pdev bug
-
- I held down ^A a bit to repeat the same command multiple times.
- tx died with the following:
-
- ReplyWithData couldn't send pdev reply; status "address given by the user for a system call was bad"
-
-
-
- 502.
- Date: Mon, 2 Oct 89 02:22:23 PDT
- From: douglis (Fred Douglis)
- Subject: pmake/migration bug w.r.t. high parallelism
-
- when pmake goes past about 10 parallel tasks, it seems to hang fairly reliably.
- no idea why yet. could be machine flakiness (i ran up to 10 based on an rlogin
- to hijack, then needed to use hijack too so ran the pmakes from kvetching, and
- that's when they started hanging. rebooting didn't help. still, 10 seems like
- a funny magic number...)
-
-
-
-
- 503.
- Date: Mon, 2 Oct 89 03:01:23 PDT
- From: douglis (Fred Douglis)
- Subject: new X too unstable
-
- I reported a bug the other day when xkill caused my Xcfb.new server
- to die, right? well, "xhost"
- generated an error when given a hostname, and caused the
- server to die when invoked with no arguments.
-
-
-
- 504.
- Date: Mon, 2 Oct 89 09:13:43 PDT
- From: brent (Brent Welch)
- Subject: Re: Weird /mic behavior
-
- The chdir() by csh does an open which goes through the
- regular recovery stuff in the prefix table routines.
- It appears, however, that the open wasn't correctly
- aborted when the recovery failed due to "domain unavailable".
- There is probably some bug associated with the failure
- to reestablish a prefix table entry. By the time the
- ls was done, then the prefix handle was already marked
- invalid, so the prefix was cleared and another broadcast
- was made. So, the difference between your two cases was
- not due to a difference between 'cd' and 'ls', but between
- the first use of the /mic domain and subsequent ones.
- The first case seems repeatable, and perhaps I'll have
- time to test it on assault or something.
-
-
-
- 505.
- Date: Mon, 2 Oct 89 09:19:59 PDT
- From: brent (Brent Welch)
- Subject: Re: rpcecho/rpccmd -ping
-
- rpcecho -s does a 'send' instead of an 'echo':
-
- Usage of command "rpcecho"
- -n: Number of RPCs to do
- Default value: 100
- -d: Datasize to transmit
- Default value: 32
- -D: Do tests at all sizes
- -e: Echo off RPC server (default)
- -r: Number of reps for each size
- Default value: 10
- -s: Send instead of Echo
- -t: Trace records taken (runs slower)
- -c: High priority
- -h: name of target host
- -help: Print this message
-
-
-
-
- 506.
- Date: Mon, 2 Oct 89 09:22:52 PDT
- From: brent (Brent Welch)
- Subject: Re: tx/pdev bug
-
- ReplyWithData couldn't send pdev reply; status "address given by the user for a system call was bad"
-
- This is a known problem. If the user's buffer is bad tx gets an
- error and aborts. The pdev code needs to be fixed to determine
- which buffer (user's or server's) is bad.
-
-
-
- 507.
- Date: Mon, 2 Oct 89 13:00:06 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: lots of icky sparc station stuff
-
- I knew it would be a useful exercise to try living on a sparc station... Lots
- of stuff seems to have gone haywire since the last time I tried a lot of this.
- And some of these are continuing bugs.
-
- 1) The machine gets in a mode sometimes from a particular csh window where
- everything exec'd from the csh gets a seg fault. This is horrible since it
- probably means something about caches or register windows not being flushed
- at the right time. Brent noticed this happening once on a regular sun4 if
- I'm not mistaken, so this isn't just a sparc station problem. This did not
- happen before, so something has changed to create this mess.
-
- 2) Vi keeps forgetting its TERMCAP and using open mode. I reported this bug
- before.
-
- 3) Some X applications, such as xclock, keep dying in XtConvert().
-
- 4) It's sometimes hard to debug user programs with seg faults, since the
- debugger often seg faults on them. When I can debug them though, it appears
- there was no reason for them to seg fault where they did. This again points
- to a cache or register window flushing problem that isn't updating the stack
- at the right time...
-
-
-
- 508.
- Date: Mon, 2 Oct 89 13:33:00 PDT
- From: eklee (Edward K. Lee)
- Subject: missing directory
-
- One of my directories /sprite/users/eklee/cmds.md seems to have
- mysteriously vanished. It was there Friday but not today.
- I'm not sure when it was last modified (probably a long time ago.
-
-
-
- 509.
- Date: Tue, 3 Oct 89 13:54:35 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: unknown problem with thyme
-
- Thyme got very sluggish on me and a ps -au reveiled a process in the
- UNUSD state using 47.7% of the cpu. I put thyme into the debugger
- but was unable to attach to it from allspice. It also ignores
- kmsg -c requests. Thyme was running kernel 1.023. I don't think there
- were any migrations in progress. File this one away for future
- reference.
-
-
-
- 510.
- Date: Tue, 3 Oct 89 15:28:12 PDT
- From: pmchen (Peter M. Chen)
- Subject: corrupted file
-
- My mailbox got corrupted sometime (don't know when):
- Any ideas of what happened?
- I left a copy of the file in ~pmchen/tmp/corruptedmail
-
-
-
- 511.
- Date: Tue, 03 Oct 89 16:28:18 PDT
- From: rab (Robert A. Bruce)
- Subject: piquante
-
- Piquante is in the debugger:
-
- Fatal Error: Software time is ahead of the hardware
-
-
-
- 512.
- Date: Tue, 3 Oct 89 16:37:22 PDT
- From: brent (Brent Welch)
- Subject: Allspice cache crash
-
- Allspice died in the block cache. It apparently found a
- block associated with a previous incarnation of a domain.
- John H. had unmounted a file system and remounted it
- under a different name. I believe that the unmount left
- a block in a funny state in the cache. It was an indirect
- block, or perhaps a block of file descriptors - it thought
- it was associated with the "physHandle" of the domain,
- which is used for indirect blocks and file descriptors.
- However, while the block referenced the physHandle, the
- physHandle didn't reference the block. A panic occurred
- when DeleteBlock tried to take this block away from the physHandle.
- More details: the block was in the LRU list, and it was found
- by FetchBlock. FetchBlock called DeleteBlock in order to
- take the block away from its current owner. DeleteBlock found
- the block in the hash table, but it died trying to remove it
- from the per-file block list (or indirect block list). This
- is code I have stared at in the past. There is no obvious
- place where things could easily get out of wack, but it is
- all rather complex and not obviously correct either.
- I did glance at the Unmount code, and there doesn't seem to
- be any particular attention payed to the cache. A write-back
- is done, but there are no consistency checks made on
- the physHandle associated with the domain. Checks should be
- added - the unmount code is probably the least used code we have.
-
-
-
- 513.
- Date: Tue, 3 Oct 89 20:17:11 PDT
- From: pmchen (Peter M. Chen)
- Subject: transient bug in floating point?
-
- About 15 minutes ago I compiled a program which had always run fine
- and got an odd error from a print statement
-
- printf("tot1=%d, tot=%d, i=%d\n",tot1,tot,i);
- printf("%.2lf %% requests fulfilled in %d ms\n",
- (double)tot1*100.0/tot,i);
- printf("%d %lf %d\n",i,(double)tot1*100.0/tot,i);
-
- produced something like:
- tot1=300, tot=301, i=40
- 99.67 % requests fulfilled in 120385833 ms
- 40 99.6666667 120385833
-
- I'm making this up because I don't have the real output when the program
- was doing this (so the 120385833 is fudged). But it did give garbage there
- instead of "40". It looks like the results of the floating point is
- wrecking the next argument to printf.
-
- I've recompiled it many times and it did this consistently (on a sun3). Then
- I moved to a sun4 and it worked fine. After this, I moved the routine to
- a separate module and recompiled (on the sun3) and it works fine now.
- I am not compiling with hardware support. The program is ~pmchen/raid/mult
- and the offending routine is printlat (in printlat.c).
-
-
-
-
- 514.
- Date: Wed, 04 Oct 89 09:54:34 PDT
- From: rab (Robert A. Bruce)
- Subject: pepper
-
- Pepper is in the debugger:
- Fatal Error: Trying to broadcast non-prefix
-
-
-
- 515.
- Date: Wed, 4 Oct 89 12:29:19 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: uwm dies
-
- My uwm dies using Xmfb.new. It doesn't go into the debugger, it just
- goes away. Any new ones I start just sit there and do nothing.
-
-
-
- 516.
- Date: Wed, 4 Oct 89 16:41:20 PDT
- From: brent (Brent Welch)
- Subject: File server lock-out
-
- You can fully occupy the attention of a Sprite file server
- by writing a huge file. The new SCSI interface happily
- queues up a zillion blocks, and then the SCSI interrupt
- handler chains through the blocks writing each one.
- In the meantime the server doesn't do much else.
- I noticed this the other day when pounding on assault,
- and it happened again today when John H tried to write
- a huge file to test out a new disk. My innocent editor
- write-back hung until his job was aborted. You can also experience
- this by trying to use Oregano as a workstation. I haven't
- fully diagnosed the problem with the debugger or anything,
- but I think that between the disk interrupts and the
- block cleaner things are effectively blocked out
- of the file system cache. I'm not sure exactly, but
- perhaps my write couldn't complete because the server
- couldn't read an indirect block until the file currently
- being written out cleared the disk queue.
-
- Adding interrupt priorities would only help mouse response
- when the disk is busy, and perhaps this isn't that important.
- I'm not sure what to do about the disk queue. Perhaps we can
- throttle the block cleaner so it only does N blocks of a file at
- a time (the cleaning is done on a per-file basis) so that other
- cache I/O's can slip in. This is much like the old problem
- we had where the disk queuing wasn't fair at all, and once
- the block cleaner got a hold of it it didn't let go until
- it was done. Now the block cleaner is free to queue up
- the whole cache!
-
-
-
- 517.
- Date: Wed, 4 Oct 89 17:50:50 PDT
- From: tve (Thorsten von Eicken)
- Subject: ds3100 ld spits out "LINK EDITOR MAP" on "ld -r"
-
- Yeah, I have a "bigcmd" directory. I type mkmf and pmake and at the end
- when it comes to the link, it does it and then spits the LINK EDITOR MAP
- at me. Is this a feature?
-
-
-
- 518.
- Date: Wed, 4 Oct 89 23:53:09 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: assault runs out of memory
-
- Assault runs out of memory if you get too many file handles.
-
-
-
-
- 519.
- Date: Thu, 5 Oct 89 13:19:51 PDT
- From: brent (Brent Welch)
- Subject: signal/proc deadlock
-
- Garth found Basil in a deadlock today. I hunted around for
- a while and deduced that there was a deadlock between
- the Sig:sigLock and the Proc:tableBlock. I didn't fully
- figure the deadlock out, as I simply stopped after spending
- a half an hour or so looking around. Basil had many
- processes in the debug state, by the way. There were
- also a coupld processes trying to send signals, including
- an Rpc_Server from some remote host. Finally, the Xsprite
- process was locked, but I could't quite figure out who
- had it locked.
-
- With the 'holderPC' and 'holderPCBPtr' we ought to
- have enough information to figure these deadlocks out.
- (In fact, having this really helps a lot.) However,
- it is still tedious although slightly less time consuming.
-
- Is hopeless to hope for improved debugger support?
- I am fearful that the difficult bugs in Sprite will
- not be solvable in our current environment, especially
- as the experts/implementors begin to leave. This is
- a strong plea for better attention to the debugging
- facilities. For exmaple, it is still probablistic
- whether you can examine a local variable in gdb.
- Sometimes you just get "Error: invalid address 0".
- It is also painful to examine 30+ processes to
- determine what the deadlock is. Or, for another example,
- if a machine hangs while trying to enter the debugger
- (i.e. the cache-lock is held so you can't sync the disks)
- then you have to manually scan through all the processes
- and see which one got the panic. It is little things
- like this that conspire against good debugging. It's
- too bad that none of us want to work to improve the
- debugging environment (hint hint). I think there is
- lots of room for improvement. Flame off.
-
-
-
- 520.
- Date: Fri, 6 Oct 89 08:39:34 PDT
- From: ouster (John Ousterhout)
- Subject: Crash and disk space
-
- When I came in today Allspice was catatonic: it didn't respond to
- its keyboard at all and wasn't responding to rpc requests. I gave
- up and rebooted it. Also, disk space was empty on /sprite/src/kernel.
- In order to unwedge Mace (which was apparently hung trying to write
- back something from a migrated process), I deleted the sun3.1.023
- kernel (it didn't appear to me to be in use any more).
-
-
-
- 521.
- Date: Fri, 6 Oct 89 12:06:28 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: tx window in the debugger
-
- My tx window with a long-standing kernel debugging session in it just went
- into the debugger. I don't think I did anything weird except that I typed
- a return key in it for the first time after a number of hours.
-
-
-
- 522.
- Date: Fri, 6 Oct 89 16:24:30 PDT
- From: brent (Brent Welch)
- Subject: Mint crash Friday
-
- As you probably know, mint had a rough afternoon on Friday.
- The underlying cause is that the bug I attempted to
- fix concerning scavenging a handle for a file that is
- being deleted was not fixed, apparently. Mint was deleting
- a file in /tmp and got a bus error because a handle didn't have a file
- descriptor attached to it (a sign of scavenging).
- Interestingly, fscheck didn't complain (this time) about
- the file that was in the process of being deleted.
- Mint then had troubles during recovery. After the
- very first round of re-opens it simply hung - lots of processes
- in the ready state, and an lpd process in the running state.
- I rebooted, and this time fscheck found that the tmp file
- which caused the first crash referenced a non-allocated file descriptor.
- Anyway, towards the very end of recovery #2 mint crashed again,
- this time with a different bug related to local file handles,
- another one I had thought I'd fixed. This bug concerns
- what happens when the handle table fills up - there is a window
- of time where a handle is partially installed, and apparently
- the wrong guy got it back. (That's a hand-wavy explaination.
- The problem is probably in Fsutil_HandleInstall.)
- Now for the fun part. The next reboot sequence failed with
- the following message:
- Unknown user brent (!!)
- It turns out that /etc/passwd got truncated (yow!),
- I was the owner of /sprite/cmds/csh, and csh couldn't
- execute the /boot/bootcmds script because of no /etc/passwd.
- Luckily we could access the other servers from the single
- user shell, and we copied /t1/etc/passwd to /etc/passwd,
- sourced the boot script, and we seemed to be back in business.
- The third time is the charm, as they say, and mint was
- able to make it through recovery ok. I'll go look at
- my brain-damaged code that concerns local file handles,
- as mint crashed in two different ways in this area.
-
-
-
-
- 523.
- Date: Sun, 8 Oct 89 10:32:49 PDT
- From: ouster (John Ousterhout)
- Subject: Mint crash
-
- When I came in this morning Mint was not responding to RPC requests.
- I went up to the machine room and discovered that Allspice was out
- of disk space on /user1, and Mint had used up all its console paper
- printing out disk full messages for files it was trying to write
- to /user1. This apparently had hung Mint? I added more paper to
- the console, at which point Mint printed a bunch of unintelligble
- garbage on the console and then went catatonic (no response whatsoever
- to the console). At this point I rebooted Mint. Unfortunately,
- many of the clients did not recover ("Recovery failed <30002>").
- I then rebooted Mint a second time, but many clients still didn't
- recover. Fortunately, piracy was one of the lucky ones. I then
- used piracy to free up disk space on /user1, and when I did that
- Mace then recovered. I don't know whether the lack of disk space
- somehow impacted recovery or this was just a coincidence.
-
-
-
-
- 524.
- Date: Sun, 8 Oct 89 13:39:27 PDT
- From: ouster (John Ousterhout)
- Subject: Kgdb and registers
-
- It doesn't appear to be possible to set register values from Kgdb.
- When Mendel and I tried this today we ended up with the value "4"
- in the register, which wasn't at all what we thought we were
- storing.
-
-
-
- 525.
- Date: Sun, 8 Oct 89 13:41:19 PDT
- From: ouster (John Ousterhout)
- Subject: Sun-4, interrupts, and debugging
-
- If a Sun-4 is forced into the debugger with "kmsg -d", and is then
- debugged with kgdb, kgdb does not correctly identify the stack
- frame that was active when the network interrupt occurred. This
- makes it very hard to locate an infinite loop in the kernel, for
- example. Mary, can you fix the interrupt code to fudge enough
- information on the stack so that Kgdb can correctly identify the
- frame that was interrupted?
-
-
-
-
- 526.
- Date: Sun, 8 Oct 89 20:10:11 PDT
- From: pmchen (Peter M. Chen)
- Subject: pmake
-
- I get the following error message from pmake clean
- --- tidy ---
- rm -f %(sh: syntax error at line 1: `(' unexpected
- *** Error code 2
- pmake: 1 error
-
- I had just 'pm mkmf'-ed this directory. The offending directory is
- ~pmchen/simul, and this error occurred on anise and on mustard (with TM=sun4).
-
-
-
-
- 527.
- Date: Sun, 8 Oct 89 20:15:26 PDT
- From: pmchen (Peter M. Chen)
- Subject: floating point error?
-
- I have another program with really weird errors. Floating point variables
- get changed by miscellaneous program statements (such as a printf statement).
- This happens on the sun3's (mustard), compiled with hardware floating point.
-
- It doesn't happen on the ds3100's. I don't know whether it happens on the
- sun4's or not (see previous message to bugs about sun4 pmake problems).
-
- The problem does NOT happen using software floating point on the sun3's.
-
- The program is ~pmchen/simul/simul. You can produce the error with
- simul -d 1 -q 1 -i 2 -r 0
-
- Watch for the NaN outputs.
-
-
-
-
- 528.
- Date: Fri, 6 Oct 89 10:09:12 PDT
- From: pmchen (Peter M. Chen)
- Subject: problem in allspice
-
- I am using the Sprite FS in, shall we say, out of the ordinary ways: ie.
- writing thousands of files to one directory. I was running simulations
- on parsley which output lots of small files to ~pmchen/simul/out/small.
-
- The csh script I ran is in ~pmchen/simul/ex/small.
-
- This ran fine (to completion) last night on parsley, but might be the cause
- of the problems this morning. As per instructed by John O., I F1-A'ed parsley
- so we could see if allspice stays up for a while. Of course, Randy's
- machine is thus unavailable.
-
-
-
-
- 529.
- Date: Mon, 09 Oct 89 06:34:21 PDT
- From: rab (Robert A. Bruce)
- Subject: allspice
-
- When I came in this morning allspice was frozen. It didn't
- respond to the keyboard or to the network. There were no
- error messages on the screen. /user1 was being dumped when
- it died.
-
-
-
- 530.
- Date: Mon, 9 Oct 89 12:28:00 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Re: Sun-4, interrupts, and debugging
-
- [I sent this yesterday, but it seems that at least neither Fred nor Mendel
- got it. I think something went wrong with fenugreek's sendmail or whatever.]
-
- It sounds to me like people don't have quite the picture of how the register
- windows and stack frames work on the sun4. The problem is not in the kernel.
- We can easily fix the problem, and will do so, but it shouldn't mean changing
- what's in a trap frame, and there's really no such thing as "fudging
- enough information" since an interrupt frame is just a trap frame on
- the sun4 (because interrupts are just asyncronous traps on the sun4).
- I think everybody agreed this was a nice clean way of doing it and changing
- this right now would involve reworking a lot of stuff.
-
- Here's what the debugger is getting confused about: as it traces back along the
- stack, looking at each frame as if it's a C call frame, it looks for the pc of
- the calling routine in %i7. This is %o7 of the previous register window. If a
- trap occurs, the register window gets bumped forward one (by the hardware) and
- various values are stuffed into registers in the new register window (by the
- hardware). It's this trap frame that the debugger sees. The problem is that
- the pc of where the trap occurred gets put into %l1 (by the hardware) instead
- of into %i7. This confuses the debugger since it doesn't special-case the trap
- frame. But I can't stuff the pc into %i7, since that's part of the state we
- can't overwrite. So, in %i7, the debugger usually finds the pc of the routine
- that last made a procedure call from that window. What we can do instead, is
- have the debugger recognize the range of pc's for the trap (and interrupt)
- handlers, and if it finds such a pc in a %i7, it can special-case what to do
- with the stack frame before that, since it will be a trap frame and not a C
- call frame.
-
-
-
- 531.
- Date: Tue, 10 Oct 89 23:01:11 PDT
- From: shirriff (Ken Shirriff)
- Subject: Bug in Proc_AddMigDependency?
-
- Proc_AddMigDependency (procMigrate.c), line 182, calls HashFind(table,
- (Address) processID), which calls Hash, which uses the second argument
- as a pointer to the string to hash. Since the processID doesn't point
- to a valid string, this crashes.
-
- This happened when I tried to do a pmake running a new kernel of mine.
- The stack trace is MachSysCall->MachUserReturn->Sig_Handle->Proc_MigrateTrap
- ->Proc_AddMigDependency->Hash_Find->hash.Hash.
- As far as I can tell this bug has always been in there, but I don't know
- why things have worked up until now. Maybe my kernel is confusing
- something?
-
-
-
- 532.
- Date: Tue, 10 Oct 89 23:56:22 PDT
- From: Fred Douglis <douglis>
- Subject: ds3100 flakiness returns
-
- things are acting weird again. for example, a couple of times today i
- had cc's returning exit statuses of 1 with no warnings, where a
- recompile went fine; i had one set of cc's complain about typedefs not
- existing when they were fine (again, recompiling worked fine); and
- finally i spent a half hour trying to boot a new kernel, hitting
- "Enabling timer interrupts" early in the boot sequence and then dying.
- I tried different combinations of reset+init+bootpath+etc without
- help. finally i relinked my kernel and it worked just fine.
-
-
-
- 533.
- Date: Wed, 11 Oct 89 14:14:30 PDT
- From: Fred Douglis <douglis>
- Subject: repeated recovery
-
- when mint froze up before, i got a bunch of "cacheable/busy" conflict
- messages and then recovery over and over. Finally, once things
- started to clear up, I was down to a tight loop of recovery followed
- by a stale handle on a file that was accessed by a process that went
- into the debugger as soon as mint started responding again. I'll send
- brent my syslog with a copy to the sprite-log -- no need to burden
- everyone else with it, since it's very long.
-
-
-
- 534.
- Date: Wed, 11 Oct 89 20:57:50 PDT
- From: Fred Douglis <douglis>
- Subject: /dev/syslog truncation bug
-
- I was able to test out my syslog change on sun4s, and while trying to
- exercise the bug I ran into something else. It seemed that if I
- suspended something reading /dev/syslog, and I wrote lots of stuff to
- syslog in one operation, I could overflow the syslog and cause an old
- kernel to go into an infinite loop as expected. But, both old and new
- kernels had another problem: if I said "cat xyz > /dev/syslog"
- repeatedly, each one would overwrite the previous one rather than
- filling the buffer and overflowing! After lots of head scratching I
- found out that the ioctl interface for syslog clears the buffer, and
- csh opens /dev/syslog with truncation set. This means that it would
- be possible to lose stuff from the syslog if it got truncated before
- the reader got in to get the data. I'm going to remove support for
- IOC_TRUNCATE; speak up if you can think of a case to reinstate it.
-
-
-
- 535.
- Date: Wed, 11 Oct 89 21:39:27 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: evil black blob lives!!
-
- I've got one of those nasty black blobs that extends from my cursor
- to the right edge of my tx window on hijack. I was under the
- impression this was fixed, but evidently the blob knows differently.
- It is now immune to 'clear'.
-
-
-
- 536.
- Date: Thu, 12 Oct 89 10:29:02 PDT
- From: Fred Douglis <douglis>
- Subject: large selection doesn't work
-
- If I select a large region, and then use "select" to write it to a
- file, nothing gets produced. I'm pretty sure this worked as of a few
- weeks ago. If I select several lines at a time, things work okay.
-
-
-
- 537.
- Date: Thu, 12 Oct 89 11:27:33 PDT
- From: tve (Thorsten von Eicken)
- Subject: something wrong with mail: /sprite/spool/mqueue not found
-
- On the sun4's (burble, allspice) shortly after I send mail, I get an error
- message on my tty saying:
- queuename: Cannot create "qf~Z210967" in "/sprite/spool/mqueue": no such file or directory
- This does not happen on ds3100, (nor sun3s I think).
-
-
-
- 538.
- Date: Wed, 11 Oct 89 16:32:06 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: ranlib dies on sun4
-
- Ranlib gets a segfault on the sun4 in the routine stash() at line 309
- when it dereferences s->n_un.n_name. The address is out of bounds (0xfe15280c).
-
-
-
- 539.
- Date: Thu, 12 Oct 89 13:01:40 PDT
- From: ouster (John Ousterhout)
- Subject: Mail file trashed
-
- The last few bytes of my mail file got lost today. The result
- was a partial header from Mary, followed by a header and message
- from a 60B student. By the time I noticed it, the mail file had
- already been modified a couple of times, so I didn't bother to
- save the damaged copy. Mary, if the message you sent just after
- the one about "tx search dies on a sun4" is important for me to
- see, could you resend it?
-
-
-
- 540.
- Date: Thu, 12 Oct 89 13:35:16 PDT
- From: mendel (Mendel Rosenblum)
- Subject: wall kills rlogin
-
- Brent's last wall message terminated a rlogin from murder to anise. The
- message:
-
- anise% df .
- Prefix Server KBytes Used Avail % Used
- /mnt anise 284000 3148 252452 1%
- anise% Broadcast message from brent@oregano.Berkeley.EDU at 13:16 ...
- Sayonara - rebooting after 20 days of uptime
- to test recovery and the new kernel
-
-
-
- 541.
- Date: Thu, 12 Oct 89 13:36:12 PDT
- From: mendel (Mendel Rosenblum)
- Subject: wall kills rlogin
-
- Brent's last wall message terminated a rlogin from murder to anise. The
- message:
-
- anise% df .
- Prefix Server KBytes Used Avail % Used
- /mnt anise 284000 3148 252452 1%
- anise% Broadcast message from brent@oregano.Berkeley.EDU at 13:16 ...
- Sayonara - rebooting after 20 days of uptime
- to test recovery and the new kernel
-
- PdevServiceRequest: bad request on request stream: 540095032
- Connection closed.
- murder%
-
-
-
- 542.
- Date: Thu, 12 Oct 89 17:53:21 PDT
- From: brent (Brent Welch)
- Subject: FS deadlock found
-
- I think I have figured out the deadlock that has killed
- mint the past few times. It occurs during times of heavy
- load because a client responds to a call-back too fast,
- and locks are aquired (released, actually) in the wrong
- order. I need to take off for dinner, but it would be
- nice if I could have some time to truely verify this
- deadlock (by scouring the code some more) and figure
- out a correct fix for the new .new kernels. If Mary wants
- to use things as is and reboot Allspice with a better
- sun4 kernel (perhaps sun4.mgbaker) that would be ok.
- Currently mint and oregano are running sun3.brent (BW.151)
- which has my other RPC/RECOV/FS fixes in.
-
-
-
- 543.
- Date: Thu, 12 Oct 89 18:01:48 PDT
- From: mendel (Mendel Rosenblum)
- Subject: slow source listing in gdb.new
-
- The reason that the new gdb lists source lines so slowly on Sprite is
- that it calls the library routine isatty() for each character displayed.
- On unix the isatty() routine takes around 100-200 microseconds while it
- takes 2-4 milliseconds on Sprite. The reason is that Sprite forwards the
- ioctl to the terminal driver using pdevs.
-
-
-
- 544.
- Date: Thu, 12 Oct 89 18:37:45 PDT
- From: mendel (Mendel Rosenblum)
- Subject: cc1.68k dies
-
- cc1.68k dies on the following code fragment from the net module.
-
- NetIERecvUnitInit()
- {
- volatile struct {
- char recvUnitStatus:7 ;
- } *scbPtr;
-
- scbPtr->recvUnitStatus;
- }
-
-
-
- 545.
- Date: Thu, 12 Oct 89 18:44:44 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: ipServer and deadlock
-
- The ipServer on covet died. When I killed the inetd and ipServer in
- preparation to restart the ipServer, covet went into the debugger with
- deadlock on schedMutex. I wrote down the pc, etc, in case anyone is
- interested.
-
-
-
- 546.
- Date: Fri, 13 Oct 89 11:17:27 PDT
- From: brent (Brent Welch)
- Subject: Re: vmPageTableInc bug was List problem
-
- I added a list and wasn't using the List macros right,
- which resulted in me trashing vmPageTableInc. I seem
- to do this everytime I add a new list, because if
- you aren't careful you end up using the list header
- as a list element. The List_ macros are happy to
- return you the list header, which is dangerous. If you
- don't use LIST_FORALL, you have to use the following
- code sequences to get the first element, then the next:
-
- /* Get the first element of the list, or NIL if the list is empty */
- if (List_IsEmpty(recovPingList)) {
- pingPtr = (RecovPing *)NIL;
- } else {
- pingPtr = (RecovPing *)List_First(recovPingList);
- }
-
- /* Get the next element of the list, or NIL if at the end of the list */
- pingPtr = (RecovPing *)List_Next((List_Links *)pingPtr);
- if (List_IsAtEnd(recovPingList, (List_Links *)pingPtr)) {
- pingPtr = (RecovPing *)NIL;
- }
-
- brent
- ps. You can't use LIST_FORALL if the list can change dynamically.
- In this case I have a list that can grow do I use a monitor to
- control list iteration and addition of items to the list. Anyway,
- I ended up using the list header as a list element....
-
-
-
- 547.
- Date: Fri, 13 Oct 89 11:45:54 PDT
- From: ouster (John Ousterhout)
- Subject: Second gateway
-
- I sent mail to Herve DaCosta asking about getting a second gateway
- out of the SPUR net to replace ji. There's already a machine in
- the works for this, called "csgw2". It should be on-line in the
- not-too-distant future.
-
- On a related note, Brian Shiratsuki asked if Sprite is capable of
- switching name servers if the first choice doesn't respond. I
- don't know if we do this, but if it isn't hard to implement it
- seems like a good idea. Thus if csgw is down we could switch to
- ginger or csgw2.
-
-
-
-
- 548.
- Date: Fri, 13 Oct 89 12:03:22 PDT
- From: Fred Douglis <douglis>
- Subject: profiling broken
-
- user-level profiling (on sun3s) is not recording run-time PC sampling.
- I can get a call graph but not how much time is spent in each routine.
- (I've talked to Bob about this, but I wanted to file an official bug
- report too.)
-
-
-
- 549.
- Date: Fri, 13 Oct 89 12:21:55 PDT
- From: tve (Thorsten von Eicken)
- Subject: lost mail to bugs 'cause of mail problem
-
- (the /sprite/spool/mqueue not found on sun4's stuff...)
- I'll remail everything, pardon if somethig arrives twice.
-
-
-
- 550.
- Date: Fri, 13 Oct 89 12:23:55 PDT
- From: tve (Thorsten von Eicken)
- Subject: The mail problem on sun4's
-
- (It always says something like:
- queuename: Cannot create "qf~Z275756" in "/sprite/spool/mqueue": no such file or directory
- )
- I guess it has to do with /sprite/spool/mqueue being owned by root, group
- wheel and NOT world-writable.
-
-
-
- 551.
- Date: Fri, 13 Oct 89 12:26:27 PDT
- From: tve (Thorsten von Eicken)
- Subject: group sprite
-
- I know it's a pain to keep track of what group files belong to, but:
- if someday the world gets reorganized (with the new disks), could the person(s)
- doing that take care of the group files/dirs get into?
- Those who don't have a "su" window on their screen will thank you! (hehe..)
-
-
-
- 552.
- Date: Fri, 13 Oct 89 13:11:07 PDT
- From: Fred Douglis <douglis>
- Subject: _extendsfdf2 missing
-
- I tried to link a new copy of something using libc_p. it found
- everything but _extendsfdf2. I looked for this in libc and saw that
- there was an object file in gnulib/sun3.md/oldobjs but nothing in
- sun3.md itself. _extendsfdf2.po is a link to a nonexistent
- _extendsfdf2.o in sun3.md. i suspect if we were to remove libc.a at
- this point and remake the library from scratch (as is done every so
- often), a lot of programs might not link anymore.
-
- i just checked for other missing links, and _builtin_new, _lshrsi3,
- _subsf3, and _varargs all suffer from the same problem.
-
- anyone know what happened here?
-
-
-
-
- 553.
- Date: Fri, 13 Oct 89 13:22:11 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: fs bug
-
- Oregano just printed the following to its syslog:
-
- BlockIOProc: firstSector(1862854) > lastSector (630107)
- BlockIOProc: firstSector(1862854) > lastSector (630107)
- ...
- BlockIOProc: firstSector(7803064) > lastSector (630107)
- BlockIOProc: firstSector(4644646) > lastSector (630107)
-
- Somebody thought the disk was bigger than it actually was. It looks like
- BlockIOProc returns SUCCESS in this case. Why doesn't it panic, or
- at least return failure?
-
-
-
- 554.
- Date: Fri, 13 Oct 89 13:36:51 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: rlogin trashed
-
- /sprite/cmds.sun3/rlogin was overwritten with garbage at about 1:00 pm.
- I noticed at about 12:59, at which point the descriptor had been
- modified at 12:58:20. The last descriptor modified time was 13:05:02.
- I've moved the file to /sprite/trashed. I can't make any sense of its
- current contents so I have no idea who did it.
-
-
-
- 555.
- Date: Fri, 13 Oct 89 13:41:32 PDT
- From: Fred Douglis <douglis>
- Subject: Re: rlogin trashed
-
- the first string in the trashed file is a line from the loadavg
- daemon. looks like recovery got confused. in fact, i'll bet i know
- why: fenugreek was in the debugger, and i wanted to use it, and i had
- no idea why brent (?) threw it into the debugger around 8am today so i
- figured i'd continue it and see what happened. that's about the time
- the problem arose, now that i think of it. also, rlogin was
- continually being updated.
-
- the string occurs at offset 0, which is odd. i would expect it to be
- offset (8*187), which would be host 8's entry in the database file, or
- at offset 0 in /hosts/fenugreek/migInfo, which is in a different
- domain.
-
-
-
-
- 556.
- Date: Fri, 13 Oct 89 13:48:36 PDT
- From: brent (Brent Welch)
- Subject: Re: fs bug firstSector > lastSector
-
- BlockIOProc: firstSector(4644646) > lastSector (630107)
-
- Somebody thought the disk was bigger than it actually was. It looks like
- BlockIOProc returns SUCCESS in this case. Why doesn't it panic, or
- at least return failure?
-
- The server shouldn't panic, of course. What it does is return
- SUCCESS and zero bytes transferred, because this emulates what
- happens when you try to read past end-of-file.
-
-
-
- 557.
- Date: Fri, 13 Oct 89 13:49:15 PDT
- From: rab (Robert A. Bruce)
- Subject: dump
-
- The tape drive isn't working. When I try to access it I get
-
- /hosts/murder/dev/exabyte.norewind: connection timed out
-
- and this message appears on murder's console:
-
- Warning: SCSI3 can't select SCSI3#0 Target 5 LUN 0
-
- I checked all the cables and everything seems to be okay. I tried
- power cycling the tape drive, and tried a couple different tapes.
- Then I tried booting an old kernel, but that didn't help either.
-
- Since the tape didn't work, I put this morning's dump into
- /t6/dump.lev1.13Oct.
-
-
-
- 558.
- Date: Fri, 13 Oct 89 13:55:29 PDT
- From: pmchen (Peter M. Chen)
- Subject: decstation cc error
-
- I was in ~pmchen/verses/verse, and issued pm on forgery. Here's what happened:
-
- forgery% pm
- --- ds3100.md/verse.o ---
- rm -f ds3100.md/verse.o
- cc -g3 -O -Dds3100 -Dsprite -Uultrix -I/users/pmchen/lib/include -I. -Ids3100.md -I/sprite/lib/include -I/sprite/lib/include/ds3100.md -c verse.c -o ds3100.md/verse.o
- ccom: Warning: verse.c, line 140: statement not reached
- endwin();
- ------------^
- (ccom): verse.c, line 141: ccom: Internal: schain botch
- }
- ^
- *** Error code 1
- pmake: 1 error
-
- The same compile worked fine on nutmeg. Any ideas? Do we have the dec
- compiler?
-
-
-
- 559.
- Date: Fri, 13 Oct 89 14:11:58 PDT
- From: Fred Douglis <douglis>
- Subject: sendmail
-
- this is because thorsten was using an invalid "option" (Mail foo -c
- bar) that confused sendmail. sendmail works fine normally even if a
- user is unknown. there is a bug when sending to recipient "-c" but
- this isn't related to sprite.
-
-
-
- 560.
- Date: Fri, 13 Oct 89 14:44:11 PDT
- From: tve (Thorsten von Eicken)
- Subject: flaky size on /bin/ls -ls
-
- can someone explain the following (happens on ds3100 & sun4c, dunno sun3)
- [gluttony tve] /bin/ls -ls worm-pipe
- 76 -rw-rw-r-- 1 tve 72175 Oct 13 14:36 worm-pipe
- [gluttony tve] cp worm-pipe foo
- [gluttony tve] ls -ls worm-pipe foo
- 71 -rw-rw-r-- 1 tve 72175 Oct 13 14:42 foo
- 76 -rw-rw-r-- 1 tve 72175 Oct 13 14:36 worm-pipe
- [gluttony tve] diff foo worm-pipe
- [gluttony tve]
-
-
-
- 561.
- Date: Fri, 13 Oct 89 14:45:26 PDT
- From: tve (Thorsten von Eicken)
- Subject: more flaky /bin/ls -ls
-
- sorry, forgot to mention that an /bin/ls -ls after the diff yields:
- [gluttony tve] ls -ls worm-pipe foo
- 76 -rw-rw-r-- 1 tve mic 72175 Oct 13 14:42 foo
- 76 -rw-rw-r-- 1 tve mic 72175 Oct 13 14:36 worm-pipe
-
-
-
- 562.
- Date: Fri, 13 Oct 89 14:49:11 PDT
- From: tve (Thorsten von Eicken)
- Subject: uncompress didn't work on sun4's (fixed)
-
- compress did. I recompiled and reinstalled /a/attcmds/compress for sun4s.
-
-
-
- 563.
- Date: Fri, 13 Oct 89 15:22:22 PDT
- From: brent (Brent Welch)
- Subject: Re: more flaky /bin/ls -ls
-
- You are experiencing the delayed-write caching of Sprite.
- The indirect blocks are not allocated to the file until
- it is written to disk, so they don't show up in the block
- count until sometime after the file is created. If write-back
- caching worries you, remember that all Sprite editors use
- fsync(), which really and truely forces files to disk.
-
-
-
- 564.
- Date: Tue, 10 Oct 89 01:56:12 PDT
- From: tve (Thorsten von Eicken)
- Subject: ds3100 cc seems to define "ultrix"
-
- I know why this is so... I just wanted to point this out in case someone
- ports software which uses #defines ...
- The search for ..../include/sys/limits.h was dependent on ultrix
- being defined, so maybe one can ignore my previous message!?
-
-
-
- 565.
- Date: Tue, 10 Oct 89 08:48:24 PDT
- From: brent (Brent Welch)
- Subject: RPC error
-
- Thyme crashed while handling an open() because it got
- an errant RPC reply from the server. I've seen this before.
- The RPC trace shows the problem:
-
- c3cc0 out 0.0000 Q 32 14 26 6 get attr 16 0 0 0 0 500
- c3cc0 in 0.0000 R 32 14 26 6 get attr 112 0 0 0 0 500
- c3cc1 out 0.0100 Q 32 14 26 6 open 92 15 0 0 0 500
- c3cc1 in 0.0100 R 32 14 26 6 open 112 0 0 0 0 500
- c3cc2 out 0.0000 Q 32 14 26 6 get attr 16 0 0 0 0 500
- c3cc2 out 0.1000 Qp 32 14 26 6 get attr 16 0 0 0 0 500
- c3cc2 in 0.0000 R 32 14 26 6 get attr 112 0 0 0 0 500
- c3cc3 out 0.0100 Q 32 14 26 6 open 92 15 0 0 0 500
- c3cc3 in 0.0000 R 32 14 26 6 get attr 112 0 0 0 0 500
- c3cc3 in 0.0000 R 32 14 26 6 open 112 0 0 0 0 500
-
- See how RPC c3cc3 gets a "get attr" reply instead of an "open" reply.
- Apparently thyme resent its "get attr" request at about the same time
- that mint replied. Then, after it issued its open request it
- picked up the retransmitted RPC "get attr" reply instead of the
- open reply. My hunch is that perhaps the "get attr" reply was sitting
- in thyme's input buffer already, at the time the open request was
- issued, and the client dispatcher is erroneously picking it up.
-
-
-
- 566.
- Date: Tue, 10 Oct 89 08:56:20 PDT
- From: douglis (Fred Douglis)
- Subject: loadavg recovery problem
-
- After the file servers rebooted, i noticed that "finger" didn't list
- many people. turns out several hosts were listed as down. this was still
- true after about a half hour. logging into them must have triggered
- recovery, however, since within a minute of logging into the two i tried out,
- they were listed as up again.
-
-
-
- 567.
- Date: Tue, 10 Oct 89 10:10:55 PDT
- From: Fred Douglis <douglis>
- Subject: repeating console write bug found
-
- ... I hope.
- Turns out that when the buffer overflowed in Dev_SyslogWrite, it
- wouldn't subtract the amount written directly to the console, so it
- would return that 0 bytes were written and Fs_Write would try again.
- My reasoning is that this would happen anytime a user process wrote to
- /dev/syslog when the buffer was full (but not for printfs in the
- kernel, which is why we don't see the problem more often).
-
- I'm remaking dev and will include this fix in the new kernels I'm
- going to build today. I hope to push this stuff out to "new" as quickly
- as possible since I want to start gathering statistics anyway.
-
-
-
- 568.
- Date: Sat, 14 Oct 89 12:54:50 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: Something funny with /dev/syslog?
-
- If I execute "cat /dev/syslog", it returns "/dev/syslog: invalid argument".
- This means no syslog window. Does anyone know of something that changed
- recently?
-
-
-
- 569.
- Date: Sat, 14 Oct 89 13:03:51 PDT
- From: brent (Brent Welch)
- Subject: Fsutil_HandleInstall
-
- I finally saw the bug in Fsutil_HandleInstall that has
- been bothering me for some time. Handle installation
- is sort of divided into two parts so that memory
- allocation can be done outside the Handle monitor lock.
- An external routine does a Fsutil_HandleFetch to see if the
- handle is already there. If it isn't, it allocates memory
- and then drops in to HandleInstallInt routine to install
- the handle under the monitor lock. The bug occurred if the handle appeared in
- the hash table in between the initial Fetch and the
- subsequent InstallInt. The InstallInt was clever enough to
- recheck for the existence of the handle, but it wasn't clever
- enough to return it! The external routine always assumed that
- the memory it allocated was the used for the handle,
- but that could be wrong. The result was a garbage handle
- being returned from Fsutil_HandleInstall. I had been suspecting
- the LRU replacement stuff, but I kept overlooking the obvious bug.
- Anyway, Oregano crashed during recovery with a garbage handle
- and this prompted be to look at the code again. I've rebooted
- Oregano (while pounding on its file systems with process migration)
- and it works ok. I'm going to add a little "would-have-crashed"
- print statement and reboot it again to make sure I'm exercicing
- the error case.
-
-
-
- 570.
- Date: Sat, 14 Oct 89 13:05:39 PDT
- From: brent (Brent Welch)
- Subject: Watchdog Reset during migraiton and recovery
-
- I started a pmake on sage and then rebooted Oregano.
- After Sage recovered its handles and started compiling
- again it suddenly got a Watchdog Reset. I assume that
- some migration related call didn't quite work right.
-
- ps. Thyme was also doing a pmake, but it survived.
-
-
- 571.
- Date: Sat, 14 Oct 89 13:56:06 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: weirdness linting?
-
- I've been trying to lint the net module. If I execute lintsun4c in one
- window, it will try linting it. If I execute it in another window, it says
- it doesn't know how to lintsun4c. It used to know how a few minutes ago.
- The environments, etc, appear to be identical in the 2 windows. Could
- somebody tell me what's happening here?
-
- I'm executing all of this on a sun3.
-
-
-
- 572.
- Date: Sun, 15 Oct 89 15:41:47 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: compiler problem for sun4c net module
-
- The compiler is generating signed byte loads instead of unsigned byte loads
- to access the fields of this structure:
-
- /*
- * Descriptor Ring Pointer (page 21) (Byte swapped. )
- * Also,
- */
- typedef struct NetLERingPointer {
- unsigned short ringAddrLow :16; /* Low order ring address.
- * Must be quad word aligned.
- */
- unsigned int logRingLength :3; /* log2 of ring length. */
- unsigned int :5; /* Reserved */
- unsigned int ringAddrHigh :8; /* High order ring address. */
- } NetLERingPointer;
-
-
- For instance, in the broken version it generates:
-
- 0xf605148c <NetLEReset+240>: ldsb [o0+0x16],o1
- 0xf6051490 <NetLEReset+244>: and o1,0x1f,o1
- 0xf6051494 <NetLEReset+248>: or o1,0x80,o1
- 0xf6051498 <NetLEReset+252>: stb o1,[o0+0x16]
- 0xf605149c <NetLEReset+256>: add l0,0x4,o1
-
- while in the working version it generates:
-
- 0xf6051478 <NetLEReset+220>: ldub [o0+0x12],o1
- 0xf605147c <NetLEReset+224>: and o1,0x1f,o1
- 0xf6051480 <NetLEReset+228>: or o1,0x80,o1
- 0xf6051484 <NetLEReset+232>: stb o1,[o0+0x12]
- 0xf6051488 <NetLEReset+236>: add l0,0x4,o1
-
- for the source code line 259 in netLE.c:
-
- 259 initPtr->recvRing.logRingLength = NET_LE_NUM_RECV_BUFFERS_LOG2;
-
- The kernels to compare are sun4c.broken and sun4c.works in my kernel
- directory. They are identical except that in the working version, the net
- net module was compiled with the old compiler and assembler. Both were
- compiled with optimization on in the net module.
-
- Didn't we go through this once before when we first switched to gcc and the
- new assembler? Sometime in mid-July? I have it in my log book as being
- July 14th.
-
-
-
- 573.
- Date: Sun, 15 Oct 89 16:20:10 PDT
- From: deboor@buddy.Berkeley.EDU (Adam R de Boor)
- Subject: Re: compiler problem for sun4c net module
-
- in the code you sent, it doesn't matter much if it does an unsigned or a signed
- load, since it immediately ands the result with 0x1f. What is of more
- concern, I should think, is the four-byte difference in the offset used to
- access the field, no?
-
-
-
- 574.
- Date: Sun, 15 Oct 89 16:58:45 PDT
- From: tve (Thorsten von Eicken)
- Subject: gluttony in weird state IP is up, RPC is down
-
- loadavgd lists it as being down.
- rpccmd -ping times out
- /sprite/cmds/ping answers (!) but with ~300ms delay
- what's that? I think I had it once before. Is the kernel dead but the
- user processes still alive? (huh?)
-
-
-
- 575.
- Date: Mon, 16 Oct 89 11:47:35 PDT
- From: root (The Sprite God)
- Subject: No add host script
-
- There obviously isn't a script that adds a Sprite to the
- network because there were a number of details left out
- regarding Garlic (a.k.a. Mustard). The symbolic link for
- its swap directory was wrong, and there wasn't an entry
- for it in /sprite/boot.
-
-
-
- 576.
- Date: Mon, 16 Oct 89 11:48:28 PDT
- From: root (The Sprite God)
- Subject: network routing
-
- We need to fix network routing for Sprite. When Mustard
- changed its identity to Garlic we had to rerun netroute
- on every host so that the ReverseArp done at boot time
- got the correct SpriteID back.
-
-
-
- 577.
- Date: Mon, 16 Oct 89 11:50:10 PDT
- From: root (The Sprite God)
- Subject: yp ethers needed for Sprite sun3s
-
- It turns out that an entry in the yp ethers databas
- is needed in order for a Sun3 to find out its
- Internet Address during bootstrap. Apparently
- Sprite doesn't properly do RARP. Furthermore,
- manually adding an arp entry on ginger didn't help.
- Only until I updated /etc/ethers and did a ypmake
- was Garlic (a.k.a. Mustard) able to get an Internet address.
-
-
-
- 578.
- Date: Mon, 16 Oct 89 11:51:55 PDT
- From: shirriff (Ken Shirriff)
- Subject: anise->ginger rcp
-
- When I try to rcp a kernel from anise to ginger, the rcp seems to go
- into the twilight zone after copying, say, 188416 or 24576 bytes. After
- that nothing happens.
- Also, "size" on the sun4 returns exit status 2, causing my pmake to quit
- unless I do pmake -i.
-
-
-
- 579.
- Date: Mon, 16 Oct 89 11:56:29 PDT
- From: root (The Sprite God)
- Subject: ds3100 need yp ethers entry, too
-
- It turns that Sprite DecStations also need an entry in
- the YP ethers database so they too can ReverseArp
- and discover their Internet Address. We need to fix
- Sprite so it can do its own ReverseArp.
-
-
-
- 580.
- Date: Mon, 16 Oct 89 12:30:50 -0700
- From: bks@okeeffe.Berkeley.EDU (Brian K. Shiratsuki)
- Subject: yp ethers needed for Sprite sun3s
-
- i see. i purposefully deleted the entries from the sunos tables
- because i didn't want the sun servers to compete with the sprite
- server(s).
-
-
-
- 581.
- Date: Tue, 17 Oct 89 10:10:17 PDT
- From: brent (Brent Welch)
- Subject: bib broken
-
- bib was ported to Sprite some time ago, but it doens't
- quite work right. In a short paper with four references
- it uses the last reference for all of them! The citations
- are [author88a] [author88b] and so on, and at the end
- the last citation is repeated four times. The example
- is in ~brent/doc/wwos.89 . There is a Makefile there.
-
-
-
- 582.
- Date: Tue, 17 Oct 89 11:01:32 PDT
- From: Fred Douglis <douglis>
- Subject: proc_serverproc needs to be dynamic
-
- background server processes should be handled like rpc_servers --
- created when needed, up to a large limit, and reclaimed when not
- needed. otherwise we run into problems like brent's needing to have a
- separate recovery process, or kernels getting wedged when all the server
- processes go to sleep on some condition.
-
-
-
- 583.
- Date: Tue, 17 Oct 89 12:26:10 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: kgdb.sun4 goes into the debugger
-
- On murder, I was debugging covet with kgdb.sun4. I did a "pid 0xc"
- commmand and it seg faulted. Here is the stack trace:
-
- #0 0x44310 in Fs_Read ()
- #1 0x4018a in read ()
- #2 0x98bc in myread (desc=6, addr=(caddr_t) 0x9ca2c20 "", len=3394721) (core.c line 459)
- #3 0xc94e in psymtab_to_symtab (pst=(struct partial_symtab *) 0xc9334) (dbxread.c line 2739)
- #4 0x2f1d2 in find_pc_symtab (pc=4127256644) (symtab.c line 1122)
- #5 0x2c266 in select_frame (frame=(FRAME) 0x172cbc, level=0) (stack.c line 615)
- #6 0x1ad72 in normal_stop () (infrun.c line 1084)
- #7 0x1a028 in start_remote () (infrun.c line 414)
- #8 0x288fa in remote_attach (pid=12) (remote.c line 262)
- #9 0x1bd22 in pid_command (args=(caddr_t) 0x7dcbc "0xc", from_tty=1) (kgdbcmd.c line 89)
- #10 0x1cb58 in execute_command (p=(caddr_t) 0x7dcbc "0xc", from_tty=1) (main.c line 481)
- #11 0x1cc2a in command_loop () (main.c line 507)
- #12 0x1ca14 in main (argc=2, argv=(caddr_t *) 0x9fdfd04, envp=(caddr_t *) 0x9fdfd10) (main.c line 434)
-
-
-
- 584.
- Date: Tue, 17 Oct 89 12:29:09 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: deadlock on covet
-
- Before the debugger crashed, I got the following info about the deadlock
- on covet:
-
- The deadlock was over the schedMutex lock. The process holding the lock
- was the "su" program. It was grabbing the lock in Sched_LockAndSwitch().
-
- The current process was the ipServer. It was grabbing the lock in
- Sched_GatherProcessInfo().
-
-
-
- 585.
- Date: Wed, 18 Oct 89 12:49:03 PDT
- From: brent (Brent Welch)
- Subject: Vm_Stat.kernMemPages wrong on ds3100
-
- The kernMemPages value looks more like a byte count
- as opposed to a page count. On pepper, for example,
- it is currently 1050532.
-
- Actually, the kernel page count on pepper is 1050532 / 4,
- or 262633. It isn't clear yet what this number is.
-
- The kernMemPages field includes a very large hole
- in the VM address space. The kernel code is loaded
- at 0x80000000, while the data is loaded at 0xc0000000,
- and the kernMemPages is wrongly calculated by subtracting
- the start of the code from the end of the data. I can
- account for this with the data I've already taken, but
- I think John H. understands how to fix this.
-
-
-
- 586.
- Date: Wed, 18 Oct 89 19:28:21 PDT
- From: eklee (Edward K. Lee)
- Subject: tx window disappeared
-
- I was running a shellscript on sassafras from forgery when after about
- 20 minutes or so, I got the following message to my syslog and
- my window to sassafras died along with whatever I happend to be running.
- PdevWrite: signal 14
- PdevWrite: signal 14
- PdevWrite: signal 14
-
- This is the second time that this has happend to me.
-
-
-
- 587.
- Date: Thu, 19 Oct 89 10:39:48 PDT
- From: brent (Brent Welch)
- Subject: Allspice crash
-
- Allspice died with a recursive TtyBufferOverflow. It was streaming
- this message to its console and not responding to any interrupts.
- I had accidentally used the more program and driven the terminal
- into a goofy state. I just left it that way because it hasn't
- always worked for me to power cycle the terminal. Perhaps I
- should have tried that. Sometime later the crash occurred,
- I think there were at least several hours in between when
- I wedged the terminal (I think around noon time) and when
- Allspice crashed at about 6:25.
-
-
-
- 588.
- Date: Thu, 19 Oct 89 11:30:28 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: tx in debugger on sparcstations
-
- Tx frequently dies on the sparcstation. If I remember correctly, which I seem
- to do with decreasing frequency, Mendel reported a tx problem on the sun3
- where it died with its pc set to the instruction after a select trap. That's
- what's happening here.
-
- #0 0x481ec in Fs_RawSelect ()
- #1 0x3b5fc in Fs_Dispatch ()
- #2 0x2214 in main () (tx.c line 135)
- #3 0x3b024 in start ()
-
- 0x481e0 <Fs_RawSelect>: sethi %hi(0x0),%g1
- 0x481e4 <Fs_RawSelect+4>: or %g1, 72, %g1 ! 0x48
- 0x481e8 <Fs_RawSelect+8>: t 3, %g0 !0x3
- 0x481ec <Fs_RawSelect+12>: jmpl %o7, 8, %g0 ! 0x8
- 0x481f0 <Fs_RawSelect+16>: nop
-
-
-
- 589.
- Date: Thu, 19 Oct 89 11:32:04 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: more on tx
-
- And if I detach the program in the debugger, tx picks up and keeps
- running fine. I believe Mendel mentioned that funny aspect as well.
-
-
-
- 590.
- Date: Thu, 19 Oct 89 17:45:25 PDT
- From: brent (Brent Welch)
- Subject: inetd on mint
-
- inetd went infinite on mint. I did a gcore into
- /sprite/src/daemons/inetd/inetd.core.8200e
- However, the stack backtrace is simply #0 0xc658 in Sig_SetHoldMask ()
- so perhaps gcore isn't the right thing to figure out infinite loops.
- If someone knows gdb better they can try to figure things out.
-
-
-
- 591.
- Date: Fri, 20 Oct 89 08:49:00 PDT
- From: brent (Brent Welch)
- Subject: Allspice network interface reset
-
- When I came in Friday morning Allspice was in slow mode.
- An rpcecho reported timeouts 110 resends 110 acks 11 in 100 attempts!
- I reset its network interface by hitting break-n on its console,
- and now it seems fine.
-
-
-
- 592.
- Date: Fri, 20 Oct 89 11:53:29 PDT
- From: pmchen (Peter M. Chen)
- Subject: fatal error in /sprite/cmds/vi
-
- I've been getting these off and on. It goes away the second time I issue
- the vi, but it is kind of disconcerting. Any ideas about why these have
- been popping up? Anyone else experiencing these?
-
- The error is occuring on the decstation.
-
-
-
- 593.
- Date: Fri, 20 Oct 89 12:44:45 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: spur spritemon broken
-
- The new spritemon doesn't work on the spur. It dies in XtInitialize.
- I've replaced it with the old version.
-
-
-
- 594.
- Date: Fri, 20 Oct 89 15:00:58 PDT
- From: brent (Brent Welch)
- Subject: mint overload on friday
-
- Mint died on friday, after being up almost a week.
- It was struggling along when I went to investigate it,
- spending most of its time generating
- TtyInputBufferOverflow
- messages, along with messages about clients recovering, etc.
- I'm not sure what triggered the situation, but it eventually
- got so bogged down printing error messages that it couldn't
- make forward progress. I eventually got some keystrokes
- through, enough to sync the disks and hurl it into the
- debugger. The main thing I noticed from the debugger was
- that several processes were in the ready state, but
- presumably they weren't scheduled because of the heavy
- tty traffic. On an up note, when I rebooted mint I
- got my little print statement indicating that the bug
- concerning returning garbage handles was successfully
- tested. Mint would have died during recovery if this
- hadn't been fixed. On a down note, each client had to
- recover an average of 3 times before things settled down.
- 89 recovery attempts were made, and 20585 reopen RPCs
- were serviced. The last client finished recovery 5 minutes
- and 40 seconds after mint enabled its RPC service.
-
-
-
- 595.
- Date: Fri, 20 Oct 89 15:12:14 PDT
- From: Fred Douglis <douglis>
- Subject: Re: mint overload on friday
-
- that ttyinputbufferoverflow message is a pain in the neck. when i
- look at the tty stuff to see about processing at interrupt time, I can
- also put in a check so this message is printed only once....
-
-
-
-
- 596.
- Date: Fri, 20 Oct 89 15:52:11 PDT
- From: Fred Douglis <douglis>
- Subject: update not setuid
-
- the ds3100 version of update, dated 10/3, was not setuid to root. Did
- someone install this by hand or something?
-
-
-
- 597.
- Date: Fri, 20 Oct 89 19:03:34 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: lprm bug
-
-
- If I queue a print job on a ds3100, it won't work (known bug), and then
- if I try to run lprm on a sun3 to delete the job I get:
-
- cfA025hijack.Berkeley.EDU: Permission denied
-
-
-
-
- 598.
- Date: Fri, 20 Oct 89 19:15:59 PDT
- From: pmchen (Peter M. Chen)
- Subject: Re: login: must be root to override defaults
-
- Yes, I restarted inetd on mustard by hand. This was necessary because on
- the decstations, you can't kill X and restart it without killing and
- restarting inetd and ipserver by hand.
-
- How should I make sure inetd has a clear environment? Start it up as
- root?
-
-
-
- 599.
- Date: Sun, 22 Oct 89 13:32:00 PDT
- From: brent (Brent Welch)
- Subject: Assault crash, out of memory?
-
- Assault hung today, after being up 8 days.
- I think it ran out of memory, but I can't
- be sure because the ds3100.1.032 kernel was
- carefully removed from all hosts! I think
- I already complained about this. Perhaps
- with our huge /sprite/src/kernel partition
- we won't be so hasty when removing kernel images.
- For the N'th time, never remove a kernel image
- if a file server is running it. This is easy
- to check, and unforgivable. (well, I'll forgive
- you this time.) Anyway, I've rebooted Assault
- with JHH.192. Don't even think about removing this
- kernel. This kernel lets the kernel and the fs cache
- grow much larger, so Assault shouldn't croak.
-
-
-
-
- 600.
- Date: Sat, 21 Oct 89 15:50:34 PDT
- From: brent (Brent Welch)
- Subject: Changing a domain's identity
-
- A weakness in the current prefix table stuff showed
- up when we moved /sprite/src/kernel to allspice.
- While we first unmounted /sprite/src/kernel from
- Oreagno and remounted that domain as /sprite/src/kernel.old,
- the internal domain number didn't change. This meant
- that clients which had prefix table entries for /sprite/src/kernel
- with the old token from Oregano were still accessing Oregano.
- What we need to do is change the internal domain number
- so the tokens (fileIDs) on the clients become invalid.
- John H. suggested that at boot time a server could check
- to see if its mounting a disk under the same prefix
- as before. This information is kept in the domain's
- summary sector on disk.h
-
-
-
- 601.
- Date: Mon, 16 Oct 89 14:33:43 PDT
- From: Fred Douglis <douglis>
- Subject: gethostname problem
-
- gethostname was changed sometime about a month ago to call
- Proc_GetHostIDs rather than Sys_GetMachineInfo. Unfortunately, it
- calls it to get the physical host rather than the virtual host, which
- means "hostname" and anything else that uses it will detect that
- migration has occurred. Is this a goof or was it intentional? I'm
- changing it to return the hostname for the home node. the world may
- need to be relinked.
-
-
-
-
- 602.
- Date: Mon, 16 Oct 89 17:20:24 PDT
- From: pmchen (Peter M. Chen)
- Subject: official bug report on gremlin
-
- This is the official bug report version of the gremlin problem I mailed to spriters:
-
- Ed and I have been trying to use gremlin on the ds3100's and have gotten
- a lot of weird things happening.
-
- 1) When you put down a point, black blobs often come on the screen.
- 2) The shift and control keys don't do what they're supposed to. Instead,
- they seem to repeat the last command issued.
-
- 3) the help screen is really garbled.
-
- Fred and John H. report that they've also run into these problems, which
- make gremlin extremely painful to use.
-
-
-
-
- 603.
- Date: Mon, 16 Oct 89 18:33:16 PDT
- From: eklee (Edward K. Lee)
- Subject: ds3100 crashes with FP exception in kernel
-
- I was running Sprite version 1.032 (ds3100).
- Running ~eklee/simtest/simtest from X causes the kernel to crash with a FP
- exception. I was able to repeat this three times consecutively.
- (Could trashing machine registers from user mode cause this to happen?)
-
-
-
-
- 604.
- Date: Mon, 16 Oct 89 20:40:24 PDT
- From: brent (Brent Welch)
- Subject: Oregano's network interface
-
- Sometime around 6:30pm Monday night Oregano's
- network interface went out-to-lunch. I came
- in and noticed a number of error messages and
- some recovery stuff. When I tried to do things
- like grep through system code there was essentially
- no progress until I hit L1-n on Oregano's keyboard
- to reset its interface. Someone (Mendel?) needs
- to figure out how to put in a watchdog on this
- flakey Intel interface.
-
-
-
- 605.
- Date: Sun, 22 Oct 89 17:32:54 PDT
- From: tve (Thorsten von Eicken)
- Subject: gdb problems on sun4
-
- I can't manage to get to variables. I always get the message
- 'No symbol "foo" in current context'.
- Is this known? Am I missing something? I compiled with -g, and no optimization.
- Something funny though: when the symbol-file is read, I get an error message:
-
- Reading symbol data from /mic/X11R3/src/cmds/Xsp/sun4.md/Xsp...done.
- Type "help" for a list of commands.
- (gdb) Warning: Unknown symbol-type code `P' at symtab pos 296.
-
- The sameprogram, compiled for the sun3, loads into gdb without error.
-
-
-
- 606.
- Date: Sun, 22 Oct 89 19:19:02 PDT
- From: tve (Thorsten von Eicken)
- Subject: mkmf handles file named "version.h" specially
-
- this is NOT said in the manual, as far as I can remember!
- Thorsten
- (and I don't think it's a nice idea either)
-
-
-
- 607.
- Date: Sun, 22 Oct 89 19:37:45 PDT
- From: tve (Thorsten von Eicken)
- Subject: mkmf/pmake doesn't know how to make sun4.md/lex.o from lex.l
-
- on the sun3 and ds3100 everything is fine. on the sun4 i get a
- pmake: Can't figure out how to make sun4.md/lex.o. Stop
- error. I did many mkmf's, pmake tidy, etc.. no change. weird!
-
-
-
- 608.
- Date: Mon, 23 Oct 89 10:16:35 PDT
- From: shirriff (Ken Shirriff)
- Subject: tx refresh on ds3100
-
- If I clear a tx window and then select "Set Termcap" from the "Control"
- window, on the decstation, the window scrolls before the menu
- disappears, leaving a white rectangle on the normally gray part of
- the window. This doesn't happen on the sun3.
-
-
-
- 609.
- Date: Mon, 23 Oct 89 18:37:56 PDT
- From: tve (Thorsten von Eicken)
- Subject: on sun4, pmake of bigcmdtop doesn't always do the final link
-
- It always goes down the subdirs and produces the linked.o, but it
- won't always do the final link of all the linked.o into the command.
- The behaviour is not consistently repeatable. It happens with
- /mic/X11R3/src/cmds/Xsp (the X11R3 server).
-
-
-
- 610.
- Date: Mon, 23 Oct 89 19:25:13 PDT
- From: tve (Thorsten von Eicken)
- Subject: is the cc man page up-to-date with gcc 1.36?
-
- It doesn't seems so... the comments for -gg are out of date, -fcombine_regs
- doesn't exists any more, etc...
-
-
-
-
- 611.
- Date: Tue, 24 Oct 89 10:12:14 PDT
- From: brent (Brent Welch)
- Subject: mint crash
-
- Mint died last night after /sprite filled up. After it ran out
- of paper it sort of hung, and then when I added paper I got
- the good old "TtyInputBufferOverflow" problem. Apparently
- all the Proc_ServerProc's were stuck on something. It is
- possible they were hung on recovery with Oregano. Oregano
- died for a different reason, a consistency check in the
- Reopen code that shouldn't have been there. Perhaps we
- should dedicate a process to tty input? I had to do this
- for recovery pinging because of similar problems. Historically
- we used to have several different kernel processes for different
- tasks, but Mike Nelson gradually changed most things over to
- use Proc_CallFunc. These are subject to starvation, mainly
- because they are used to handle page faults, and a crashed
- server can block page faults, thereby using up the Proc_ServerProcs.
- In this case, I don't think creating more Proc_ServerProcs is
- the right solution. Restructuring the page fault code so the
- retry is done at a higher-level, not using a Proc_ServerProc
- would be best.
-
-
-
- 612.
- Date: Tue, 24 Oct 89 11:03:27 PDT
- From: Fred Douglis <douglis>
- Subject: /tmp
-
- the remote link for /tmp disappeared sometime recently. i was unable
- to start up X properly a few minutes ago. anyone know the last time
- they're sure /tmp was still around? we might be able to focus on a
- recent reboot (like my own machine, or some other) as a culprit.
-
-
-
-
- 613.
- Date: Tue, 24 Oct 89 14:15:26 PDT
- From: brent (Brent Welch)
- Subject: bootp infinite
-
- A bootp went infinite on mint. I took a quick look at
- it was in Fs_RawRead, which is called from recvfrom(),
- which is called from main line 165. I suspect some
- bug in the interaction with the retry loop in Fs_RawRead.
-
-
-
- 614.
- Date: Tue, 24 Oct 89 17:51:37 PDT
- From: shirriff (Ken Shirriff)
- Subject: nm on ds3100
-
- If I do nm ds3100.md/libc.o | grep errno I get
- V errno
- The man page says nothing about what "V" means. Anyone know?
-
-
-
- 615.
- Date: Tue, 24 Oct 89 18:19:59 PDT
- From: brent (Brent Welch)
- Subject: Re: Mx death (bad disk mapping?)
-
- Hmm. There shouldn't be any fragmenting going on out that
- far in the file. Nothing is fragmented beyond 40K, and
- 0xe000 is at 57K. 0x1e000 is 64K later. This isn't
- even block aligned. I don't think its RPC fragmenting
- because that isn't neatly aligned anyway, it crams as
- much as possible into each packet. It doesn't look like
- a cache hashing bug because that uses the standard
- hash function, multiply by a large prime, add 12345, etc.
- (light bulb goes on)
- It could be a disk alignment bug, what with our fancy
- mapping of blocks onto sectors. 64K is about a track size...
- Hmm, mint has a track size of 23K on its eagle, but blocks
- do overlap on adjacent tracks by 6K. It is quite possible
- there is some overlap that I don't expect because the drive
- is out smarting me, similar to what we experienced on /mic,
- althrough rarer because its due to sector slipping. What we
- should do the next time we have one of these botched files is
- determine what the disk block numbers involved are.
- brent
- (If that isn't clear, it seems possible that the last block
- in a cylinder is somehow mapped back onto another block
- in the same cylinder. I'm note sure exactly. I do know
- that things packed quite neatly into cylinders on the Eagles:
- ----------------------------------------------------
- |..1.....|..2.....|..3.....|..4.....|..5.....|..6... track 1
- ----------------------------------------------------
- ..|..7.....|..8.....|..9.....|..10....|..11....|..12 track 2
- ----------------------------------------------------
- ....|..13....|..14....|..15....|..16....|..17....|.. track 3
- ----------------------------------------------------
- .18...|..19....|..20....|..21....|..22....|..23....| track 4
- ----------------------------------------------------
- 20 tracks in all, this pattern is repeated 5 times per cylinder.
- If the drive is stealing a block from me due to a bad sector,
- I don't know what might happen.)
-
-
-
-
- 616.
- Date: Wed, 25 Oct 89 10:00:12 PDT
- From: brent (Brent Welch)
- Subject: Warning: receiver framing error on mouse
-
- Either sage's mouse is slowly croaking, or the behavior
- of the tty-driver needs to be improved when there is
- a "receiver framing error on mouse". I can wedge
- my mouse by rapidly moving it around my screen.
- I get the error message and the mouse freezes. I then
- disconnect and reconnect my mouse and continue operation.
- Can't we reset the serial line (issue a break or something?)
- in this case?
- brent
-
-
- 617.
- Date: Wed, 25 Oct 89 10:49:28 PDT
- From: Fred Douglis <douglis>
- Subject: prefix mapping bug
-
- this may be the same as something we discussed before, but I'm not
- sure...
-
- % df /c
- Prefix Server KBytes Used Avail % Used
- /tmp oregano 300696 240823 29803 88%
-
- wasn't getwd supposed to fix this? does df do its own equivalent
- operation or something?
-
-
-
-
- 618.
- Date: Wed, 25 Oct 89 13:38:17 -0700
- From: tve@ernie.Berkeley.EDU (Thorsten Von Eicken)
- Subject: nfsmounts on oregano very unreliable
-
- Right now, msgs doesn't work on gluttony and hangs forever (can't even kill).
- /eros/octtools is not available and hangs forever.
- Same yesterday evening.
- df hangs because oreganos nfs stuff is botched.
- I don't know where the problem is, but I get the impression I can't rely at
- all on the nfsmount stuff. Any comment? Shall I just forget about it and
- consider it as a probabilistic service?
- -Thorsten
- Sorry if I sound harsh, I should have waited to calm down before sending
- this mail... but from home (with a stupid tty) unkillable processes are a
- real pain (can't just delete the tx window).
-
-
-
- 619.
- Date: Wed, 25 Oct 89 16:41:12 PDT
- From: tve (Thorsten von Eicken)
- Subject: why isn't the dbm library installed?
-
- nor made for the sun4. I need it in X11R3. I'm gonna make the lib for sun3 and
- sun4 in /sprite/src/lib/dbm. Should it be installed?
-
-
-
- 620.
- Date: Wed, 25 Oct 89 16:48:26 PDT
- From: tve (Thorsten von Eicken)
- Subject: wrong error message when installing
-
- Have a look why this failed:
-
- [burble dbm] pmake install
- --- /sprite/lib/lint.sun4/llib-ldbm.ln ---
- Installing: /sprite/lib/lint.sun4/llib-ldbm.ln
- Couldn't create "/sprite/lib/lint.sun4/llib-ldbm.ln": file already exists.
- *** Error code 1
- pmake: 1 error
- [burble dbm] l -d /sprite/lib/lint.sun4
- 1 drwxrwxr-x 2 mendel wheel 512 Oct 21 13:08 /sprite/lib/lint.sun4/
- [burble dbm] l /sprite/lib/lint.sun4
- total 107
- 1 drwxrwxr-x 2 mendel wheel 512 Oct 21 13:08 ./
- 2 drwxrwxr-x 44 root sprite 1536 Oct 22 12:17 ../
- 60 -rw-rw-r-- 1 rab wheel 55198 Oct 21 13:07 llib-lc.ln
- 1 -rw-rw-r-- 1 mendel wheel 517 Jul 21 14:31 llib-lcmd.ln
- 10 -rw-rw-r-- 1 douglis wheel 9853 Sep 27 13:50 llib-lcurses.ln
- 1 -rw-rw-r-- 1 mendel wheel 525 Aug 11 15:52 llib-ll.ln
- 4 -rw-rw-r-- 1 rab wheel 3462 Oct 9 12:09 llib-lm.ln
- 17 -rw-rw-r-- 1 shirriff wheel 17167 Oct 16 17:35 llib-lmx.ln
- 5 -rw-rw-r-- 1 mendel wheel 4124 Jul 21 14:15 llib-lsx.ln
- 6 -rw-rw-r-- 1 ouster wheel 5731 Oct 17 08:30 llib-ltcl.ln
- [burble dbm]
-
- Obviously it couldn't create the file because I have no write access to
- the DIRECTORY. It has nothing to do with the file itself...
- NB: I'll change the dir to be group sprite.
-
-
-
- 621.
- Date: Thu, 26 Oct 89 15:29:33 PDT
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: Xmfb.new crash
-
- My ds3100 window system just died. Maybe someone with access to the sources
- could make a quick check to see if anything obvious is wrong. It crashed
- while /usr was screwed up so maybe that has something to do with it.
-
- Segmentation fault [OpenFont:88 +0x8,0x420698]
- Source not available
- (dbx) where
- > 0 OpenFont(0xffff, 0x0, 0x0, 0x100a7f50, 0x7ddffba4) ["dixfonts.c":88, 0x420698]
- 1 ProcOpenFont(0x7ddffba4, 0x100a7758, 0x419f4c, 0x1, 0x2) ["dispatch.c":1067, 0x412a80]
- 2 dispatch.Dispatch(0x0, 0x0, 0x0, 0x0, 0x10009430) ["dispatch.c":316, 0x410f08]
- 3 main.main(0x0, 0x0, 0x0, 0x0, 0x0) ["main.c":242, 0x402da0]
-
-
-
-
- 622.
- Date: Fri, 27 Oct 89 13:54:26 PDT
- From: pmchen (Peter M. Chen)
- Subject: wrong server ID's
-
- Warning: Rpc_Dispatch, wrong server ID 25
- Client 33 rpc 2 at address: 08:00:20:01:7b:fc
- Warning: Rpc_Dispatch, wrong server ID 9
- Client 33 rpc 2 at address: 08:00:20:01:7b:fc
-
- These error messages were received on mustard (a decstation).
-
-
-
- 623.
- Date: Fri, 27 Oct 89 10:35:54 PDT
- From: Fred Douglis <douglis>
- Subject: pmake circular dependency bug
-
- If pmake is given a makefile where a target depends on itself, rather
- than printing something about a circular dependency, it just says "not
- remade because of errors".
-
-
-
-
- 624.
- Date: Fri, 27 Oct 89 15:48:43 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: printer bug
-
- When the laserwriter runs out of paper in the middle of a job, it won't finish
- the job after you refill it. It prints out a couple more sheets and thinks
- it's done.
-
-
-
- 625.
- Date: Fri, 27 Oct 89 16:12:07 PDT
- From: mgbaker (Mary Gray Baker)
- Subject: another printer problem?
-
- My job just got printed again, although I didn't request it. Maybe this
- has something to do with its having run out of paper before? Maybe it decided
- to print out another 2 pages and then wait for a while and then print the
- whole thing again?
-
-
-
-
- 626.
- Date: Mon, 30 Oct 89 14:54:18 PST
- From: tve (Thorsten von Eicken)
- Subject: problem with ranlib or ar on sun4
-
- libarary: /X11R3/src/lib/Xmu
- Let's see, Atom.c declares (globally) a couple of variables and CvtStdSel.c
- uses them (take, for example _XA_HOSTNAME). When I compile and link the
- library on a sun4, programs using this library will not link because of
- symbol undefined errors (the symbols defined in Atoms.c and used in
- CvtStdSel.c). When I link the same libarry on a sun3 for a sun4, everything
- is perfect.
-
-
-
- 627.
- Date: Mon, 30 Oct 89 12:53:28 PST
- From: tve (Thorsten von Eicken)
- Subject: /sprite/lib/man/config
-
- I have X11R3 man pages in /mic/X11R3/man and I would like to get them when
- I type man. If I use the "-c configFile" switch, I have top make a copy of
- /sprite/lib/man/config and maintain that. Or I would have to edit the config
- file and add /mic/X11R3/man at the bottom (which some people might not like).
- Is there another way? Can one specify more than one config file to man?
-
-
-
- 628.
- Date: Mon, 30 Oct 89 10:40:54 PST
- From: Fred Douglis <douglis>
- Subject: loadavg & recovery
-
- a lot of hosts are listed as being down since sometime in the middle
- of the night. i think some sort of reopen must have failed. however,
- i don't see anything in paprika's syslog, for example, to account for
- the loadavg daemon just going away. if anyone has anything in their
- syslog pertaining to this (aside from "waiting for recovery" messages)
- please let me know.
-
-
-
- 629.
- Date: Mon, 30 Oct 89 11:15:24 PST
- From: brent (Brent Welch)
- Subject: Fs_PageRead recovery failed <1>
-
- Ever had programs die after recovery because of:
-
- 10/30/89 11:41:25 mint (32) Fs_PageRead waiting
- Fs_PageRead recovery failed <1>
- Warning: VmFileServerRead: Error 1 from Fs_Read or Fs_PageRead
- MachTrap: Bus error in user proc c2139, PC = e0075d6, addr = 30400 BR Reg 80
-
- It can happen if you are running a program that has been
- changed recently by removing the image and copying in a
- new one. While the server is
- up it doesn't delete the old version of the program
- because it knows it is being executed. However, after
- a reboot "the right thing" doesn't happen. Recovery
- seems to go ok, but later on when you fault on the
- code segment you get a paging error and your program
- dies. It seems like right thing could still happen
- because the old program images end up in lost+found
- (I can see the old version of mx there right now,
- for example, which was the program that died on me.)
-
-
-
- 630.
- Date: Mon, 30 Oct 89 14:00:14 PST
- From: ouster (John Ousterhout)
- Subject: Time change
-
- Messages in my /dev/syslog are coming out with the wrong hour
- (daylight savings time, still), whereas my xclock is OK and other
- programs seem to be OK. Is this a bug in the kernel?
- -John-
-
-
-
-
- 630.
- Date: Mon, 30 Oct 89 18:03:53 PST
- From: fubar (Jay Vosburgh)
- Subject: Bug: ls man page
-
- The file type specifier 'r' (for remote link, or whatever it's
- called) in the output of "ls -l" isn't documented in the man page...]
-
-
-
- 631.
- Date: Tue, 31 Oct 89 10:22:37 PST
- From: rbk (Bob Beck)
- Subject: Need device driver interface document
-
- For sprite drivers. This would be a big help in porting Sprite to
- other machines, where drivers exist but have (eg) BSD or SysV kernel
- interfaces.
-
-
-
- 632.
- Date: Tue, 31 Oct 89 10:33:46 PST
- From: rbk (Bob Beck)
- Subject: Need "md" module interface defitions document for Sprite
-
- This sould help in porting Sprite to new machines, by avoiding ambiguity in
- which procedures there are and what they actually must do. In the absence of
- such documentation, you have to look at existing modules and determine what
- the true needs are, filtering out machine depenedencies. On talking with
- John, it would seem a list of relevant procedures and maybe a 1-liner about
- the procedure is sufficient, if the procedure documentation (header)
- specifies the interface well enough.
-
-
-
- 633.
- Date: Tue, 31 Oct 89 11:18:16 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: user/kernel time wrong
-
- The user and kernel time statistics on a multiprocessor are wrong.
- The trap handlers have to be changed to mark the current processor
- as being in kernel mode. Right now this only happens on interrupts, where
- it looks to see what mode it was in before the interrupt. This works
- fine on a uniprocessor but not on a multiprocessor.
-
-
-
- 634.
- Date: Tue, 31 Oct 89 11:48:18 PST
- From: rbk (Bob Beck)
- Subject: MASTER_UNLOCK doesn't do test-and-set
-
- This can be a problem if the cache architecture of the machine
- doesn't support an "ownership" protocol -- eg, on Sequent Symmetry,
- if the cache is runing "write-thru", both the "acquire lock" and
- "release lock" must do test-and-set (just doing a write on the
- mutex variable can race with an attempt to acquire the lock).
- However, the current code "(semaphore)->value = 0;" does work on
- Symmetry running copy-back cache mode.
-
- Just thought you would be interested -- the MASTER_UNLOCK()
- implementation isn't truly machine independent, although it's defined
- in /sprite/src/kernel/sync/sync.h
-
-
-
- 635.
- Date: Tue, 31 Oct 89 14:48:53 PST
- From: brent (Brent Welch)
- Subject: RPC binding hosed
-
- Assault went into the same state that Mint and Allspice
- got into Monday morning. You could talk with Assault
- from some machines, but not others. I poked around and noticed
- (via rpcstat -sinfo) that many RPC requests were being
- dropped on the floor (the "noalloc" field). I didn't drop Assault into the
- debugger (kdbx fear), but when I rebooted it there was
- one hung kernel process, Rpc_Daemon. This is the guy
- that's in charge of creating new server processes and for
- closing up connections on idle channels. With this process
- hung only requests over currently existing channels were
- accepted, and no dynamic re-binding of server processes happens.
- I'll go look at the code and see if I can figure out why
- Rpc_Daemon hung itself.
-
-
-
- 636.
- Date: Tue, 31 Oct 89 14:50:09 PST
- From: rbk (Bob Beck)
- Subject: sync/syncLock.c has assumptions about memory system ordering of reads and writes
-
- Sync_SlowLock() (and others, I suspect) seem to rely on the memory
- system doing things in "right" order -- ie, Sync_GetLock() may call
- Sync_SlowLock() which will try to T&S the inUse variable, then set
- waiting=TRUE, then try the T&S again... However, Sync_Unlock()
- just writes inUse=0 *then* tests waiting... Although I think this
- works on Symmetry, it's not clean and relies on strict order of
- reads and writes to processor cache; ie, if Sync_Unlock()'s read
- of waiting passed its write of inUse, this code would race and
- fail. I would prefer to see this with explicit locking of the
- "Lock" variable that avoids these problems -- ie, state manipulation
- of the Lock variable while holding a mutex inside the variable
- (Sequents kernel mutex abstractions all behave this way). This (I
- think) is much more clear, and most/all MP systems will provide
- guarantees on cache/memory writes being done when a T&S completes
- (eg, to unlock the data-structure). I think some of the higher
- performing RISC parts due out in a year or so may violate the
- assumptions you're making here.
-
- On a further note, sufficiently highly optimizing compilers might
- take it upon themselves to re-order some of these statements.
- Volatile declarations may help, but may be too strong. Some people
- (eg, Sequent) are making the compilers sensitive to various procedures
- (eg, v_lock()) to know this is a mutual exclusion point, code cannot
- be moved across this boundary, and the HW insures previous writes
- are flushed when a T&S write completes.
-
- This dependency should be documented, if not resolved otherwise.
-
-
-
- 637.
- Date: Wed, 01 Nov 89 00:57:21 PST
- From: Fred Douglis <douglis>
- Subject: prefix bugs
-
- i wanted to make a simple change: make a ds3100 export /tmp. when i
- deleted /tmp from oregano's prefix table, though, it stopped dealing
- with other prefixes (i'd get "/c unreadable" even if i deleted it
- and rebroadcasted). i had to remove /tmp from /t1/hosts/oregano/mount
- and reboot oregano. then everything was okay, except that hosts
- with entries for /tmp were able to keep accessing /c/tmp even
- though oregano wasn't exporting. i could then explicitly delete
- /tmp and force a rebroadcast and that worked.
-
-
-
- 638.
- Date: Tue, 31 Oct 89 18:02:07 PST
- From: Fred Douglis <douglis>
- Subject: emacs & ipServer on ds3100
-
- seems to be a ds3100 bug where killing the X server without killing an
- emacs client will leave the ipServer in an infinite loop. be advised
- in the meantime that exiting emacs explicitly is probably a Good
- Thing.
-
-
-
- 639.
- Date: Wed, 1 Nov 89 01:15:48 PST
- From: douglis (Fred Douglis)
- Subject: bug with permissions caching
-
- I am using /dist/dist/sprite/cmds.ds3100 as /sprite/cmds.ds3100 for my
- benchmarking. it contained no setuid files, so I found all the setuid
- files in the old cmds.ds3100 and made the new ones setuid. nevertheless,
- i couldn't run rlogin, even when i confirmed it was setuid root.
- however, copying the same file using update -O produced a file i could
- execute okay. looks like maybe sprite remembers the protection somehow??
-
-
- 640.
- Date: Wed, 1 Nov 89 08:30:54 PST
- From: brent (Brent Welch)
- Subject: decstation fonts
-
- Once again my spritemon is messed up because of some quirk
- in the decstation fonts. I switched over to Xmfb.new so
- I know that caused it. However, I'm frustrated because
- the font stuff is black magic, and I hate that. I'd really
- like a 'fonts' man page so I can figure things out myself
- instead of having to whine to the bugs mailing list. Can
- someone start a font man page?
- brent
- p.s. I know I've complained about this before, and I've probably
- gotten a good answer. However, this breaks at such long intervals
- that I've forgotten the magic incantation. We need a man page.
-
-
-
- 641.
- Date: Wed, 1 Nov 89 15:19:16 PST
- From: brent (Brent Welch)
- Subject: Mint is ailing
-
- Well, we've been having some troubles with Mint, haven't we?
- I seems to get into states of overload and begins to misbehave.
- I want to fully understand things before I go hacking away,
- however. First, as a user, don't hesitate to send me mail
- if the system craps out on you and you resort to rebooting
- your client. Ideally you shouldn't have to do that, and
- I'd like to know about it if it happens. In the meantime
- I'm going to augment Mint's kernel with some hooks so I
- can get at its recovery-related state. It seems to get
- into modes where it thinks all the clients have rebooted,
- so it yanks the rug out from under them. This triggers
- recovery actions by clients, which then overloads mint.
- I know how to tune the client side so that recovery loads
- mint less, but first I want to understand why mint freaks
- out in the first place.
-
-
-
- 642.
- Date: Wed, 1 Nov 89 18:13:01 PST
- From: brent (Brent Welch)
- Subject: Re: Something bad about caching?
-
- Fenugreek is importing /sprite/lib/ from assault. I ran 'stat' on
- the files because I suspected something like this:
-
- <sage 892> stat /sprite/lib/include/sysStats.h
- --rw-r--r-- 1 ID=(1471,155) 8310 bytes /sprite/lib/include/sysStats.h
- Server Domain File #
- 32 1 90339
- Version 62 UserType 0x0
- Created: Nov 1 15:54:20 1989
- Data modified: Nov 1 16:55:31 1989
- Descr. modified: Nov 1 16:55:31 1989
- Last accessed: Nov 1 17:06:44 1989
-
- <fenugreek 2> stat /sprite/lib/include/sysStats.h
- --r--r--r-- 1 ID=(0,0) 8172 bytes /sprite/lib/include/sysStats.h
- Server Domain File #
- 25 2 5612
- Version 3 UserType 0x0
- Created: Oct 26 20:51:26 1989
- Data modified: Oct 10 16:27:30 1989
- Descr. modified: Oct 26 20:51:26 1989
- Last accessed: Nov 1 17:00:40 1989
-
- So this is a prefix bug, not a caching bug. It seem straight-forward
- to fix the prefix bug. I can mark exported prefix handles
- specially on the server, and verify this on naming operations.
- This would ensure than naming operations are denied when
- the server stops exporting a prefix.
-
-
-
- 643.
- Date: Thu, 2 Nov 89 13:05:31 PST
- From: brent (Brent Welch)
- Subject: Re: bad decStation kernel
-
- >From what I saw last night, the new ds3100 kernel was
- dying in Mach_TestAndSet with a "bad address on load".
- Any recent changes to the mach module?
-
-
-
- 644.
- Date: Thu, 2 Nov 89 13:29:03 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: watchdog reset
-
- Thyme just suffered a watchdog reset running kernel SPRITE VERSION BW.183.
- According to Brent this is very similar to the installed kernel.
- I wasn't able to get anything from the prom -- it looked like the pc and
- sp had been reset.
-
-
-
- 645.
- Date: Thu, 2 Nov 89 13:56:13 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: loadavg wrong for MP
-
- Loadavg doesn't deal with more than one processor and will compute cpu
- utilization wrong on a multi-processor. I don't have time to fix it
- right now so this message serves as a reminder to do it later.
-
-
-
- 646.
- Date: Fri, 3 Nov 89 08:21:05 PST
- From: rbk (Bob Beck)
- Subject: Misc header file glitches
-
- John had asked me to notice procedure headers that seem "weak" or
- otherwise questionable -- ie, not sufficient specification of the
- procedure... I didn't find many (yet ;-), but thought I'd pass
- these along...
-
- /sprite/src/kernel/vm/spur.md/vmSpur.c VmMach_BootInit()
-
- Semantics and interface not well specified
-
- /sprite/src/kernel/sync/syncLock.c Sync_GetLock()
-
- Semantics not specified other than "this is kernel version".
-
- /sprite/src/kernel/rpc/rpcCall.c
-
- comment at top talks about lust:~brent/src/sun/sys/h/rfs.h
- -- is this still valid? ~brent/src/sun/sys/h/rfs.h doesn't
- exist on the Sprite network.
-
- Sig_Send() has a comment: "When we go to a multi-processor this
- routine must be rewritten to possibly interrupt a running process".
- Is this comment still valid? It looks like Sync_WakeWaitingProcess()
- handles waking the other processor...
-
-
-
- 647.
- Date: Fri, 3 Nov 89 11:28:09 PST
- From: shirriff (Ken Shirriff)
- Subject: sed bug
-
- ls | sed -e "/e/x\
- /e/p"
- causes a segmentation violation in sed.
-
-
-
- 648.
- Date: Fri, 03 Nov 89 15:40:56 PST
- From: Fred Douglis <douglis>
- Subject: tftpd dregs
-
- mint has about a half-dozen tftpd processes lying around. I don't
- know which ones to kill, or why they're not dying. I thought this bug
- had been fixed a while ago.
-
-
-
- 649.
- Date: Fri, 03 Nov 89 15:55:05 PST
- From: Fred Douglis <douglis>
- Subject: eviction/loadavg bug
-
- i noticed that sage was listed as being down; debugging it showed it
- was in the middle of an eviction request. looks like it's possible
- for an eviction to get lost, or in any case there may be some race
- condition. next time someone notices loadavg getting wedged, please
- let me know so i can debug the kernel to see where the process is and
- the internal kernel state relating to eviction.
-
-
-
- 650.
- Date: Fri, 3 Nov 89 16:25:45 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: gethostent
-
- The routine gethostent() appears in the gethostbyname man page, but
- does not exist in the C library.
-
-
-
- 651.
- Date: Fri, 03 Nov 89 17:47:05 PST
- From: Fred Douglis <douglis>
- Subject: ds3100 ar/ranlib status?
-
- i'm really getting fed up with ar appending new copies into libraries
- instead of replacing the old ones. I was fed up enough to try to
- build a new ar and fix the problem. The catch is, it was just
- recently recompiled, and the new one worked fine when given the same
- command (ar r ....). I looked in the sprite log and it seems the
- problem is really with ranlib: the sprite ranlib wouldn't compile (and
- still won't), and the ultrix ranlib wouldn't work with our ar.
-
- as a temporary fix, i am going to install sprite's ar as ar.sprite,
- and change biglib.mk to invoke ar.sprite instead of ar for
- decstations. as a more permanent fix, we need to fix ranlib. I took
- a look at it and don't think it looks good -- the a.out hdr formats
- and constants and macros are all too different. I started trying to
- convert it but found that I'm missing the "symbol table offset" that's
- there for the suns. maybe bob knows more about this stuff and can
- take a look sometime?
-
-
-
- 652.
- Date: Sat, 4 Nov 89 16:02:20 PST
- From: tve (Thorsten von Eicken)
- Subject: ranlib on sun4 very flaky
-
- I mentioned this before... try:
- cd /sprite/src/lib/dbm; pmake clean; pmake # I did "pmake installdebug" but
- # I suppose "pmake" will do it too.
- ... and watch the ranlib go into debug state...
-
-
-
- 653.
- Date: Sat, 4 Nov 89 16:44:35 PST
- From: tve (Thorsten von Eicken)
- Subject: sed problem on sun4s
-
- The following, found in /sprite/lib/mkmf/mkmf.top, doesn't work on sun4's
- because sed doesn't output the last line of input if it isn't terminated
- by a newline. I.e. after the "tr" command above, the input to sed is a single
- line without terminating newline. Sed on the sun4 will not output anything
- at all.
-
-
-
- 654.
- Date: Sat, 4 Nov 89 16:59:36 PST
- From: tve (Thorsten von Eicken)
- Subject: are process ids guaranteed to be unique in the network?
-
- Or is there just a "high probability" that they are unique?
- Doing pmakes on sun3's I often get (compiling for sun4):
- --- sun4.md/XCopyArea.o ---
- rm -f sun4.md/XCopyArea.o
- cc -DERRORDB=\"/X11R3/src/lib/X11/XErrorDB\" -DTCPCONN -DFONT_SNF -DFONT_BDF -DCOMPRESSED_FONTS -DSPRITE -Usprite -Uunix -Uultrix -DINCLUDE_ALLOCA_H -I/X11R3/lib/include -O -msun4 -Dsprite -Dsun4 -I. -Isun4.md -I/X11R3/lib/include -I/X11R3/lib/include/X11 -traditional -fwritable-strings -finline-functions -fstrength-reduce -c XCopyArea.c -o sun4.md/XCopyArea.o
- /sprite/cmds.sun3/cpp: /tmp/cc727631.cpp: invalid argument
- *** Error code 1
- pmake: 1 error
- *** Error code 1
- pmake: 1 error
-
- and when I restart pmake everything is fine. Dunno whaats going on!
-
-
-
- 655.
- Date: Sat, 04 Nov 89 20:44:56 PST
- From: Fred Douglis <douglis>
- Subject: dumps didn't complete
-
- I saw that the dumps hadn't run last night, and that Bob apparently
- wasn't around, so I tried running them. I ran "dailydump" on murder,
- and it seemed to do /user1 and /user2 just fine but then died on
- /sprite with
-
- -: I/O error
-
-
-
- 656.
- Date: Sun, 5 Nov 89 00:28:04 PST
- From: tve (Thorsten von Eicken)
- Subject: /sprite/lib/sun4.md/libc.a:socket.o:_Stat_PrintMsg
-
- This symbol is undefined. I cant'a link any of my X stuff!! please fix quick!
- To test:
- cd /X11R3/src/cmds/Xsp; pmake TM=sun4
- Sample:
- --- sun4.md/Xcfb ---
- rm -f sun4.md/Xcfb
- cc -g -O -msun4 -Dsprite -Dsun4 -o sun4.md/Xcfb ddx/snf/sun4.md/linked.o ddx/mi/sun4.md/linked.o ddx/mfb/sun4.md/linked.o ddx/cfb/sun4.md/linked.o ddx/sprite/sun4.md/linked.o dix/sun4.md/linked.o os/sprite/sun4.md/linked.o -ldbm -lm
- socket.o: Undefined symbol _Stat_PrintMsg referenced from text segment
-
-
-
- 657.
- Date: Sun, 5 Nov 89 01:08:45 PST
- From: tve (Thorsten von Eicken)
- Subject: no gcore for sun4's
-
- Could someone please compile/make one?
-
-
-
- 658.
- Date: Sun, 5 Nov 89 01:08:19 PST
- From: tve (Thorsten von Eicken)
- Subject: ipServer on crackle (sun4) in debug state
-
- Sorry, no core dump -> gcore doesn't exist
- Sorry, no backtrace -> /sprite/src/daemons/ipServer/sun4.md is empty
- Sorry, no backtrace -> /sprite/daemons.sun4/ipServer has no symbol table
- ... good job!
-
-
-
- 659.
- Date: Mon, 06 Nov 89 11:26:58 PST
- From: Fred Douglis <douglis>
- Subject: sun4/sun4c (emacs) incompatibility
-
- it seems that the same dumped version of emacs can't run on both
- vanilla sun4s and sun4c's. Since the predominant type of sun4s is, or
- will be, sun4c, I'm going to make the default version of emacs be the
- sun4c flavor. I'll move the other one to /emacs/cmds.sun4/emacs.sun4
- (emacs.sun4c and emacs will be the same).
-
- As an alternative, I could remake emacs for the sun4 with CANNOT_DUMP
- defined, so it might start up okay on both types but would take
- "forever" to get going. Let me know if you have a strong preference.
-
- I am cc'ing bugs on this because it suggests we may have to consider
- methods for distinguishing between sun4s and sun4c's at user level (in
- %MACHINE).
-
-
-
- 660.
- Date: Mon, 6 Nov 89 12:32:23 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: mx bug/feature
-
- If I run mx on multiple file (mx *.c) the first time I use the "next"
- command to get to the next file it is a no-op. The second usage gets
- me to the second file, after which all uses work properly.
-
- 661.
- Date: Mon, 13 Nov 89 10:03:32 PST
- From: Fred Douglis <douglis>
- Subject: setjmp
-
- I checked, and the ds3100 is the only one that doesn't have _setjmp.o.
- It has setjmp.o. The ultrix libc.a has both. Any idea whether we
- used to have _setjmp.o? The real question is, can we restore it from
- tape, or do we grab the ultrix .o file, or what?
-
-
-
- 662.
- Date: Mon, 6 Nov 89 14:27:02 PST
- From: tve (Thorsten von Eicken)
- Subject: lps40 access for new machines
-
- crackle has no acces to the lps40. I guess there is the same problem with
- burble, buzz, treason
-
-
-
- 663.
- Date: Mon, 6 Nov 89 22:16:52 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: Rpc_ChanFree: freeing free channel
-
- Lust just crashed trying to free an already free channel. The structure
- looks ok to me, but the state is 0. I don't see any way in which this
- could have happened, but it did. Has anyone changed anything in the
- rpc module that could have caused this? I have a copy of the stack
- backtrace if it is helpful.
-
-
-
- 664.
- Date: Mon, 6 Nov 89 23:41:51 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: oregano crash
-
- I tried to use the IOC_SCSI_COMMAND ioctl to get the size of a disk
- attached to oregano and oregano died. When the scsi command completed
- RequestDone in sun3.md/devSCSI3.c called scsiDoneProc and passed it
- a senseDataPtr of 0. scsiDoneProc then died with a bus error.
- I know this ioctl works on a sun4 using a Jaguar, but the code doesn't
- look too different so I can't figure it out. All of the data structures
- looked ok, so I think there is just a goof in the flow of control
- when using this ioctl.
-
-
-
- 665.
- Date: Tue, 7 Nov 89 09:01:35 PST
- From: mendel (Mendel Rosenblum)
- Subject: Re: ranlib on sun4 very flaky
-
- > I mentioned this before... try:
- > cd /sprite/src/lib/dbm; pmake clean; pmake # I did "pmake installdebug" but
- > # I suppose "pmake" will do it too.
- > ... and watch the ranlib go into debug state...
- > Thorsten
-
- Actually, "pmake" works. "pmake installdebug" didn't until I reinstalled
- ranlib. There have been many cases of stale object files in /sprite/cmds.sun4.
- I think we should reinstall all the sun4 commands.
-
-
-
- 666.
- Date: Tue, 7 Nov 89 09:42:05 PST
- From: tve (Thorsten von Eicken)
- Subject: tftpd on crackle ?!
-
- I just rebooted, and am getting "inetd[...]: /sprite/daemons/tftpd: exit status 0x100"
- messages every minute or so (on the console).
-
-
-
- 667.
- Date: Tue, 7 Nov 89 09:48:41 PST
- From: mendel (Mendel Rosenblum)
- Subject: Re: tftpd on crackle ?!
-
- Some host on the net send tftp request to the broadcast address when trying
- to boot. Since the tftpd daemon was never installed on the sun4s but was
- listed in the inet.conf file, inetd would try to exec /sprite/daemons/tftpd
- when each request came in. I've installed tftpd for the sun4 so you should
- not see the message anymore.
-
-
-
- 668.
- Date: Tue, 07 Nov 89 13:55:45 PST
- From: Fred Douglis <douglis>
- Subject: Re: Down machines
-
- I noticed that. looks like some sort of migration bug, in that the
- "eviction request" may not have returned properly. why either host
- thought it had a foreign process is beyond me. however, loadavg.new
- lists mint as up, and that has the timeout i mentioned in the meeting,
- so i'm pretty sure that's the case. (i also noticed mint listed as
- "hasmig" shortly after it rebooted, and was planning to look into that
- at some point. difficult, though, when it's the file server.)
-
-
-
- 669.
- Date: Tue, 7 Nov 89 14:17:13 PST
- From: tve (Thorsten von Eicken)
- Subject: what version of gcc on sun4's???
-
- I thought we had moved to gcc 1.36 a while ago? But look:
- cc -v -S goo.c
- gcc version 1.36
- target machine is sun4
- /sprite/cmds.sun4/cpp -v -msun4 -undef -D__GNUC__ -Dsparc -Dsun4 -Dunix -Dsprite -D__SOFT_FLOAT__ goo.c /tmp/cc669451.cpp
- GNU CPP version 1.34
- /sprite/cmds.sun4/cc1.sparc /tmp/cc669451.cpp -quiet -dumpbase goo.c -version -o goo.s
- GNU C version 1.34 (sparc) compiled by GNU C version 1.34.
-
-
- 670.
- Date: Tue, 7 Nov 89 14:31:20 PST
- From: tve (Thorsten von Eicken)
- Subject: gcc floating point confusion on sun3's
-
- What is going on? When is the 68881 used and when not? When is __SOFT_FLOAT__
- defined and when not? When do cpp, cc1 and as agree?
-
- [sassafras foo] cc -O -o goo68 goo.c -v -m68881
- gcc version 1.36
- target machine is sun3
- /sprite/cmds.sun3/cpp -v -msun3 -undef -D__GNUC__ -Dmc68000 -Dsun3 -Dunix -Dsprite -D__OPTIMIZE__ goo.c /tmp/cc728371.cpp
- GNU CPP version 1.36
- /sprite/cmds.sun3/cc1.68k -msoft-float -m68020 /tmp/cc728371.cpp -quiet -dumpbase goo.c -m68881 -O -version -o /tmp/cc728371.s
- GNU C version 1.36 (68k, MIT syntax) compiled by GNU C version 1.36.
- default target switches: -m68020 -mc68020 -m68881 -mbitfield -msun3
- /sprite/cmds.sun3/as -m68020 /tmp/cc728371.s -o goo.o
- /sprite/cmds.sun3/ld -X -e start -o goo68 -L/sprite/lib/sun3.md goo.o -lc
-
-
- [sassafras foo] cc -O -o goo68 goo.c -v -msoft-float
- gcc version 1.36
- target machine is sun3
- /sprite/cmds.sun3/cpp -v -msun3 -undef -D__GNUC__ -Dmc68000 -Dsun3 -Dunix -Dsprite -D__SOFT_FLOAT__ -D__OPTIMIZE__ goo.c /tmp/cc531771.cpp
- GNU CPP version 1.36
- /sprite/cmds.sun3/cc1.68k -msoft-float -m68020 /tmp/cc531771.cpp -quiet -dumpbase goo.c -msoft-float -O -version -o /tmp/cc531771.s
- GNU C version 1.36 (68k, MIT syntax) compiled by GNU C version 1.36.
- default target switches: -m68020 -mc68020 -m68881 -mbitfield -msun3
- /sprite/cmds.sun3/as -m68020 /tmp/cc531771.s -o goo.o
- /sprite/cmds.sun3/ld -X -e start -o goo68 -L/sprite/lib/sun3.md goo.o -lc
-
- -------
- I.e: it seems __SOFT-FLOAT__ is always defined, -msoft-float is default (yuck!)
-
- Thorsten (and Andreas who pointed me at this)
-
-
- 671.
- Date: Tue, 7 Nov 89 14:32:50 PST
- From: tve (Thorsten von Eicken)
- Subject: the csh notion of process time on the sun4 is broken.
-
- I guess it just need to be recompiled? Witness:
- [crackle foo] time ./goo
- i=100000 b=(INFINITY)
- 0.0u 0.0s 0:43 0% 0+0io 0pf+0sw 0k
- [lots of zeros here!]
-
-
-
- 672.
- Date: Tue, 7 Nov 89 16:50:12 PST
- From: shirriff (Ken Shirriff)
- Subject: Transient cc bug.
-
- While compiling fsCacheConsist.c for the ds3100, I got:
- ugen: internal L line 767 : build.p, line 1743
- unexpected u-code.
- I tried it again and I didn't get this.
-
-
-
- 673.
- Date: Tue, 7 Nov 89 17:46:34 PST
- From: eklee (Edward K. Lee)
- Subject: pmake profile does not work for libraries
-
-
-
- 674.
- Date: Tue, 7 Nov 89 17:56:32 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: thyme crash, CallFunc: Process queue full
-
- Thyme crashed when the Proc_ServerProc queue filled up. All of the server
- procs were ready, and one was running. The queue was full of calls to
- TransferInProc. It looks like something (interrupt handler?) was stuffing
- calls to this procedure into the queue faster than the server procs
- could handle them. Here is the backtrace of the procedure that
- discovered the full queue:
-
- #0 panic (_va_args=235192782) (sysPrintf.c line 209)
- #1 0xe04c324 in CallFunc (funcInfoPtr=(FuncInfo *) 0xe80ff38) (procServer.c line 544)
- #2 0xe04bc70 in Proc_CallFunc (func=(void (*)()) 0xe014044, clientData=(ClientData) 0xe07e5c4, interval=0) (procServer.c line 174)
- #3 0xe01403a in DevTtyInputChar (ttyPtr=(struct DevTty *) 0xe07e5c4, value=56) (devTty.c line 536)
- #4 0xe00995a in DevConsoleInputProc (ttyPtr=(struct DevTty *) 0xe07e5c4, value=56) (sun3.md/devConsole.c line 328)
- #5 0xe014090 in TransferInProc (ttyPtr=(struct DevTty *) 0xe07e5c4, callInfoPtr=(Proc_CallInfo *) 0xe80ffd8) (devTty.c line 577)
- #6 0xe04c04c in Proc_ServerProc () (procServer.c line 376)
- #7 0xe056048 in Sched_StartKernProc (func=(void (*)()) 0xe04be58) (schedule.c line 944)
- (gdb)
-
- Thyme aborted out of the debugger, ignored the watchdog reset button,
- and suffered watchdog resets in the prom, so perhaps this is a hardware
- problem.
-
-
-
- 675.
- Date: Tue, 7 Nov 89 18:08:12 PST
- From: brent (Brent Welch)
- Subject: Blocking Fs_PageRead clogs the system
-
- The VM systems uses the Proc_ServerProcs to fill pages
- during a page fault. The problem is that Fs_PageRead
- blocks during recovery, and this can use up all the
- Proc_ServerProcs. Both sloth and thyme died because
- the Proc_CallFunc queue filled up. It couldn't be
- serviced because all the Proc_ServerProcs were blocked
- on recovery inside Fs_PageRead. This fix has to be
- inside the VM system. It has to figure out what to
- do if Fs_PageRead returns EWOULDBLOCK (or something)
- so that Fs_PageRead doesn't block. I know the VM system
- already does some recovery waits because it uses the
- handle of the swap directory for this.
-
-
-
- 676.
- Date: Tue, 7 Nov 89 18:30:00 PST
- From: mgbaker (Mary Gray Baker)
- Subject: sun4 debug crash
-
- There's a bug in the sun4's that causes a cache write-back error when you
- try to debug a user process. This is new and very bad. I'm investigating
- now.
-
-
-
- 677.
- Date: Tue, 7 Nov 89 18:55:53 PST
- From: brent (Brent Welch)
- Subject: pmake installhdrs in mach
-
- When I do pmake installhdrs in the mach module
- it claims there are no sources. It doesn't
- attempt to go into the .md directories.
-
-
-
- 678.
- Date: Tue, 07 Nov 89 22:18:27 PST
- From: Fred Douglis <douglis>
- Subject: sun4 library
-
- i saw that the new finger (with the new loadavg database file) was
- successfully installed the other day for all types but sun4, so I
- tried to compile it again. It wouldn't link because the installed sun4
- libc.a was incomplete. When I tried to recompile, I had to rerun mkmf
- because lib/c/Makefile was set up only for "sun3" even though it
- looked like it hadn't been regenerated since last month sometime. Did
- someone edit it by hand?
-
-
-
- 679.
- Date: Wed, 8 Nov 89 23:43:12 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: more on process queue bug
-
- It seems fairly repeatable if you exit the console window on a sun3 such
- that your X window system gets torn down. You'll get a prompt back in
- the console window, but the first time you press a key the process
- queue overflows with calls to TransferInProc.
-
- I tried this twice on thyme running version 1.038. I looked at the
- stack but can't figure out whose putting all the calls in the queue.
- TransferInProc looks like it puts itself on the queue, but that shouldn't
- cause it to overflow.
-
-
-
-
- 680.
- Date: Thu, 9 Nov 89 12:08:44 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: 1.039 flakey
-
- There is an unknown bug in 1.039 that trashes your stack. Hijack
- died twice with a messed up stack. All I was doing at the time was
- editing files. Thyme suffered a watchdog reset when I started a
- pmake. Mint recovered for some unknown reason, and instantaneously
- thyme reset.
-
- The only stable machine this morning has been the spur, but I can't
- get to it because my other workstations insist on dying.
-
-
-
- 681.
- Date: Thu, 9 Nov 89 11:58:58 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: spritemon
-
- When I try to use spritemon to display the cpu utilization of 5
- processors it screws up and only does 4, the fifth "pane" always
- being blank. Right now all 5 processors are pegged, but spritemon
- shows the 5th as having 0 utilization.
-
-
-
- 682.
- Date: Thu, 9 Nov 89 12:31:07 PST
- From: tve (Thorsten von Eicken)
- Subject: tftp/udp server failing (looping) on crackle (sun4)
-
- My syslog just showed the folowing. Is it relevant?
- <27>Nov 9 12:29:59 inetd[6370c]: tftp/udp server failing (looping), service terminated
-
-
-
- 683.
- Date: Thu, 9 Nov 89 14:27:40 PST
- From: mgbaker (Mary Gray Baker)
- Subject: Something funny with recovery?
-
- An ls to allspice hung on me. I wasn't getting reccovery even quite a while
- after murder did, so I killed the ls and re-executed it. Then I got recovery
- and the ls succeeded.
-
-
-
- 684.
- Date: Thu, 09 Nov 89 14:09:44 PST
- From: rab (Robert A. Bruce)
- Subject: blob from hell lives!
-
- I have a blob from hell in one of my tx windows.
- Clearing the screen does not kill it, nor does
- selecting something in another window.
-
- I am running the default tx, compiled on Oct 17.
-
-
-
- 685.
- Date: Thu, 09 Nov 89 14:28:56 PST
- From: Fred Douglis <douglis>
- Subject: Re: Something funny with recovery?
-
- i thought there was a process that pinged and tried to recover, but i
- had the same problem -- paprika said waiting for recovery but didn't
- recover until i tried something new that made it talk to mint.
-
-
-
- 686.
- Date: Thu, 9 Nov 89 14:31:36 PST
- From: brent (Brent Welch)
- Subject: 2 O'Clock Glitch
-
- Did your machine go through recovery at 2 this afternoon,
- or perhaps at 11 this morning? These glitches correspond
- exactly with the times all the hosts are sampling their
- kernel statistics by running a little script.
- The global crontab is set up to do this
- at 8am, 11, 2, 5, and 8pm. The file servers take a sample
- every hour. Anyway, this overloads Mint enough to cause
- glitches. My machine got timeout when writing back both
- its migInfo.new and migInfo files. It also had to try
- recovery twice. This is an interesting comment on scalability.
- In the meantime I'm going to add a pseudo-random sleep to the script that
- gets run by the crontab.
-
-
-
- 687.
- Date: Thu, 9 Nov 89 17:46:16 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: vfork behaves differently
-
- Our implementation is somehow different from the bsd implementation.
- I have a program that runs under unix, but not under sprite. When
- I changed the vfork to fork it works fine. I think the semantics
- of vfork state that the parent cannot run while the child is using
- its resources. This implies that the parent cannot run until the
- child exec's, and I have a hunch that isn't happening.
-
-
-
- 688.
- Date: Thu, 9 Nov 89 23:50:08 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: xkill
-
- If I try to 'xkill' an iconified window my uwm exits.
- hijack<jhh 3> XIO: I/O error
- [2] Exit 1 uwm
- There is no man page for xkill either.
-
-
- 689.
- Date: Thu, 9 Nov 89 23:51:23 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: more xkill info
-
- I realized my last message was kind of lacking on details. The windows
- I want to kill, and the uwm that dies are on hijack running 1.034.
- I usually run xkill from a sun3, or in this case a sun4.
- xkill on a ds3100 doesn't do anything.
-
-
-
- 690.
- Date: Thu, 09 Nov 89 23:55:30 PST
- From: Fred Douglis <douglis>
- Subject: Re: more xkill info
-
- i believe the xkill/uwm problem exists under all configurations. uwm
- must "own" the icon when xkill goes to kill the client, or something.
- or uwm just doesn't handle the condition it hits. i don't think it's
- an xkill bug.
-
- xkill on a ds3100 usually works for me, though sometimes it's actually
- killed my X server in the process.
-
-
-
- 691.
- Date: Fri, 10 Nov 89 10:59:56 PST
- From: culler (David Culler)
- Subject: To print or not to print
-
- I can send file to lw533 from remote unix hosts (e.g. fennel), but not
- remote sprite hosts (e.g. cardamom). lpq says:
- waiting for queue to be enabled on shallot
-
-
-
- 692.
- Date: Fri, 10 Nov 89 15:29:51 PST
- From: ouster (John Ousterhout)
- Subject: Re: To print or not to print
-
- The problem is that at present every individual workstation has to
- be entered in a particular printer table somewhere. Bob, is there
- a way to set up lpd in a fashion similar to sendmail, so that all
- print requests coming from any sprite machine are considered to come
- from "sprite.berkeley.edu", so that only a single entry has to be
- made in the printer table to accomodate all Sprite hosts?
-
-
-
- 693.
- Date: Fri, 10 Nov 89 15:39:04 PST
- From: Fred Douglis <douglis>
- Subject: Re: To print or not to print
-
- Are you sure that's the problem? Seems to me there's a difference
- between unauthorized access (printing to the lps40, for example) and a
- spooling problem that claims a queue is disabled. I think it's the
- sprite printing software that's confused. I've seen this happen on
- paprika even with machines that could normally print.
-
-
-
- 694.
- Date: Sun, 12 Nov 89 15:07:43 PST
- From: pmchen (Peter M. Chen)
- Subject: mustard crash--FPU interrupt in kernel mode
-
- Fatal error: FPU Interrupt in Kernel mode
- Entering debugger with a Breakpoint trap exception at PC 0x800b5550
- I was running gremlin and ggraph and some other stuff.
- Mustard is a decstation.
- I am rebooting.
-
-
-
- 695.
- Date: Sun, 12 Nov 89 15:18:51 PST
- From: pmchen (Peter M. Chen)
- Subject: crash is repeatable
-
- Same crash (FPU interrupt in Kernel Mode), same error message (same PC).
- I was running the "new" kernel, which is 1.039, I think. To duplicate the
- problem:
- cd ~pmchen/simul/out/su_size_sy2
- simgg norm100k
- (You have to have my alias for simgg).
- Simgg runs several things, among them a nawk script and ggraph.
- I'm going to go back to the "sprite" kernel.
-
-
-
- 696.
- Date: Sun, 12 Nov 89 15:38:33 PST
- From: pmchen (Peter M. Chen)
- Subject: ggraph is the culprit
-
- Regarding the recent crashes: Apparently ggraph, when given bad input
- (example file is in ~pmchen/simul/out/su_size_sy2/debug.gg) can crash a
- decstation (haven't tried it on a sun3 yet).
-
- To duplicate the crash:
- cd ~pmchen/simul/out/su_size_sy2
- ggraph debug.gg
-
-
-
- 697.
- Date: Sun, 12 Nov 89 16:45:29 PST
- From: shirriff (Ken Shirriff)
- Subject: Makefile problem in /sprite/src/kernel/sprite
-
- If I do "pmake" in /sprite/src/kernel/sprite on the ds3100, it links
- a ds3100 kernel and then installs a sun3 kernel. If I do "pmake ds3100"
- it does the right thing.
-
-
- 698.
- Date: Sun, 12 Nov 89 17:45:46 PST
- From: brent (Brent Welch)
- Subject: Floating point on the DecStations
-
- John Hartman has mentioned that he thinks there is
- a race with the floating point unit when a trap
- is taken on the DecStation, which can result in
- a FPU trap in kernel mode. This is apparently
- the problem than Peter is having. I'm sending this
- because I'm not sure that John has posted a mail
- message about this. I do know that I've stopped
- running my floating point programs on the DecStations
- because they generate (NaN) every so often (divide
- by zero), and every so often they do this at the
- wrong time (time slice?) and cause their machine
- to panic.
-
-
-
- 699.
- Date: Sun, 12 Nov 89 17:52:03 PST
- From: tve (Thorsten von Eicken)
- Subject: problems I encountered when starting (a long time ago) on sprite
-
- I kept track of the major things I had problems with in the first weeks on
- sprite. Now that I've almost forgotten about the file, let me post it...
- Thorsten
-
- Instructions on how to boot machines.
- "-f tftp()foo" "le/ie(foo,goo,bar)gulp"
- howto make machines boot automatically
- F1-key combinations
- F1-k to kill window system, F1-A
- TX
- want customizable mouse actions (which mouse actions start, extend
- selections)
- tx shows #lines-1 x #columns-1 when resizing window
- tx insists on using ^U and ^H as kill/delete. look at parent tty!
- Various
- clarify cross compilation possibilities (sun3/sun4/ds3100) & problems
- howto mount foreign nfs file systems
- manual pages out of date
- mkmf creates *.md directories only for the machine type it is running on
-
-
-
- 700.
- Date: Sun, 12 Nov 89 20:22:52 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: Re: Floating point on the DecStations
-
- There is definately a floating point problem on the Ds3100. I suspect
- the problem is flushing the fp pipeline when entering the kernel. If
- one of the instructions generates an exception (NaN for example) then
- you get an exception while in the kernel. The code has to be smarter
- and understand when exceptions are allowed and when they aren't.
- I'll talk to Mike Nelson and see if we can come up with a simple solution.
- Unfortunately I don't have time to look into it right now.
-
-
-
- 701.
- Date: Fri, 10 Nov 89 12:34:31 PST
- From: Adam R de Boor <deboor@buddy.Berkeley.EDU>
- Subject: Re: more xkill info
-
- xkill issues an XKillClient using the window resource ID it gets back from the
- button press. The server makes no distinction between windows that are
- owned by the window manager and those that are owned by other clients. Since
- the XKillClient causes the server to forcibly shut down the connection between
- it and the client, there's nothing uwm can do if you click on an icon
- when running xkill (Fred's right: the icons are owned by the window manager).
-
- You will have to de-iconify the window and then run xkill. Sorry I never
- wrote a man page for the beast. It struck me as self-explanatory, but something
- describing foibles such as this would probably be a good thing....
-
-
-
- 702.
- Date: Sun, 12 Nov 89 15:15:50 PST
- From: pmchen (Peter M. Chen)
- Subject: RpcDoCall
-
- Burble intercepted my broadcast for / in a reboot.
- I have no idea what this means, and everything proceeded hunky-dory after
- a 1 minute wait.
- This was on mustard
-
-
-
-
-
- 703.
- Date: Mon, 13 Nov 89 15:24:24 PST
- From: tve (Thorsten von Eicken)
- Subject: queue to lps40 hung
-
- mint.Berkeley.EDU: waiting for queue to be enabled on ginger
- Rank Owner Job Files Total Size
- 1st tve 115 (standard input) 50273 bytes
-
-
- 704.
- Date: Sat, 18 Nov 89 14:32:30 PST
- From: brent (Brent Welch)
- Subject: Change symbolic links to remote links
-
- My measurements indicate that symbolic links between domains cause
- the bulk of the pathname redirections. These can be eliminated
- by converting these cross-domain links to remote links and
- setting up the server to export them. This can be done on
- a live system by first exporting the prefix
- % prefix -x /tmp -l /c/tmp
- and then changing the symbolic link to a remote link.
- Remember to update the server's mount table with an entry like:
- Export /tmp /c/tmp
-
- Pathname redirections occur in about 15% of the lookups, although
- Mint has most of them, about 22% of its lookups bounce through
- a symbolic link. Overall 0.04% of the lookups bounce through
- a remote link, although Mint sees a lot of these, too, 0.48%
- up from 0.04% before ``/sprite/src'' was added.
-
- Here is the set of symbolic links in '/'
- lrwxrwxrwx 1 root 5 Jun 29 09:16 X -> /b/X
- lrwxrwxrwx 1 nelson 7 Jul 13 16:37 X11 -> /a/X11
- lrwxrwxrwx 1 root 11 Oct 30 13:32 X11R3 -> /mic/X11R3
- lrwxrwxrwx 1 root 12 Oct 26 1987 att -> /sprite/att
- lrwxrwxrwx 1 root 13 Jan 22 1988 bin -> /sprite/cmds
- lrwxrwxrwx 1 root 9 Aug 11 12:59 emacs -> /c/emacs
- lrwxr-xr-x 1 root 12 Aug 10 1987 lib -> /sprite/lib
- lrwxrwxrwx 1 root 13 Aug 7 1988 prob -> /test/rmprob
- lrwxrwxrwx 1 root 8 Jul 11 11:21 raid -> /b/raid
- lrwxrwxrwx 1 root 16 Jul 26 12:57 spare -> /rosemary/spare
- lrwxrwxr-x 1 root 13 Jun 15 1988 swap -> /sprite/swap
- lrwxrwxrwx 1 root 9 Nov 15 10:04 t88 -> /tmurder
- lrwxrwxrwx 1 root 13 Oct 18 15:26 tftpboot -> /sprite/boot
- lrwxrwxrwx 1 root 10 Aug 10 16:30 ultrix -> /c/ultrix
-
- Note also that /swap is a link to /sprite/swap, and everything
- there is also a link. I know mint shouldn't swap to the root
- domain, but it seems like /swap could be changed back to
- a directory, and the link for mint could point to '/sprite/swap/32'
- while the others would be links to '/swap1/hostnum'.
- The link from /tftpboot to /sprite/boot is probably ok to leave,
- but all the bin directories should probably be exported so that
- mint is out of the loop.
- brent
-
-
- 705.
- Date: Mon, 13 Nov 89 18:23:54 PST
- From: brent (Brent Welch)
- Subject: System crash
-
- Mint was accidentally rebooted with an old kernel on Sunday night,
- and it died monday afternoon with a known bug. Unforetuneatly,
- Oregano got confused sometime later and wedged things for a while.
- I debugged in and re-discovered an ugly problem I'd forgotten about.
- Somehow a Proc_ServerProc is leaving a handle locked and then
- going away. This quickly screws things up. It seems clearly
- related to recovery, so I'll spend some time looking at the code.
- brent
-
-
- 706.
- Date: Sat, 18 Nov 89 16:23:18 PST
- From: brent (Brent Welch)
- Subject: dump & bug status
-
- Murder was put into the debugger on Saturday afternoon,
- and it was part way through a dump at the time. Could
- someone in 477 check up on this? It was in a recovery
- loop with Mint, but I accidentally killed mint trying
- to continue execution and catch the problem.
-
- I've found yet another bug in the Rpc_Daemon process,
- another sublte synchronization thing that showed up
- under load.
-
- There was also a deadlock on TimerMutex
- that I didn't understand. Two processes were in
- Timer_ScheduleRoutine. One was being interrupted
- so it must have been executing near the LOCK/UNLOCK
- code. Someone might verify that interrupts
- are being dis-abled soon enough in the LOCK_MONITOR
- macro so that the deadlock warning is correct.
- brent
-
-
- 707.
- Date: Sat, 18 Nov 89 16:56:17 PST
- From: pmchen (Peter M. Chen)
- Subject: mail had no "to" field
-
- The following mail had no "to" field in the header. I suspect it was
- being sent to Garth.
-
- >From eklee Sat Nov 18 00:10:18 1989
- >Received: by sprite.Berkeley.EDU (5.59/1.29)
- > id AA797494; Sat, 18 Nov 89 00:10:18 PST
- >Date: Sat, 18 Nov 89 00:10:18 PST
- >From: eklee (Edward K. Lee)
- >Message-Id: <8911180810.AA797494@sprite.Berkeley.EDU>
- >Subject: Second order disk model
- >Cc: pmchen
- >Status: R
- >
- >Pete and I have "perfected" a second order disk model.
- >This model takes as parameters, the number of cylinders, the step time,
- >the average seek time, and the full stroke seek time.
- >The model guarantees the values of the step, average and full stroke seek
- >times to equal that of the parameters.
- >We compared the amdahl drive characteristic to that predicted by the model
- >and they were very very close.
- >The model is in ~eklee/diskparam.
- >
- >Ed
- >
-
-
- 708.
- Date: Sat, 18 Nov 89 16:58:22 PST
- From: pmchen (Peter M. Chen)
- Subject: long running job dies on apathy
-
- I have a script which dies on apathy, but not on any other machine.
- On apathy it dies with
- MachExceptionHandler: User bus error on ld or st
-
- The program can be run by:
- cd ~pmchen/simul
- go apathy
- (You probably have to be me to get the paths, etc. right).
-
-
-
- 709.
- Date: Tue, 14 Nov 89 12:29:35 PST
- From: pmchen (Peter M. Chen)
- Subject: mustard.Berkeley.EDU: waiting for queue to be enabled on coriander
-
- This has caused me to not be able print for the last couple hours (during
- which I've tried rebooting mustard, coriander, power cycling the printer,
- etc.).
-
- Any tips as to how to continue? This happens consistently when I print
- several jobs in a row.
-
- mustard.Berkeley.EDU: waiting for queue to be enabled on coriander
- Rank Owner Job Files Total Size
- 1st pmchen 317 /users/pmchen/reminders 1997 bytes
-
-
-
- 710.
- Date: Tue, 14 Nov 89 10:26:48 PST
- From: pmchen (Peter M. Chen)
- Subject: printing many things
-
- When printing many things, one after another, weird stuff happens with the
- print daemon. First it stalls and doesn't send to coriander (the unix
- machine which serves our printer). Then an lpq returns with:
- mustard% lpq
-
- mustard.Berkeley.EDU: Warning: no daemon present
- Rank Owner Job Files Total Size
- 1st pmchen 286 (standard input) 16461 bytes
- 2nd pmchen 287 (standard input) 14940 bytes
- 3rd pmchen 288 (standard input) 13119 bytes
- 4th pmchen 289 (standard input) 12937 bytes
-
- no entries
-
- We can fix things by rebooting coriander, but that's hardly a good
- long term solution. It's odd because coriander can still print fine.
-
-
-
- 711.
- Date: Tue, 14 Nov 89 12:41:56 PST
- From: tve (Thorsten von Eicken)
- Subject: printer, printer, printer, where are you?
-
- [gluttony tve] lpq -Plps40
- gluttony.Berkeley.EDU: waiting for queue to be enabled on ginger
- Rank Owner Job Files Total Size
- 1st johnw 133 shifter.ps 8573 bytes
- 2nd johnw 134 shift_block.ps 12586 bytes
- 3rd johnw 135 shifter.bdnet 2121 bytes
- 4th johnw 136 shift_block.bdnet 480 bytes
- 5th johnw 137 shift_block.bdnet 480 bytes
-
- ginger.Berkeley.EDU: connection to ucbarpa is down
-
- -------------------- at the same time --------------
- ernie[tve] lpq -Plps40
-
- Rank Owner Job Files Total Size
- active fisher 261 standard input 14058 bytes
- 0 bytes
- 2nd gill 238 taduty 1369 bytes
- 3rd fisher 263 standard input 47549 bytes
- 4th fisher 753 standard input 17739 bytes
- 5th fisher 754 standard input 15685 bytes
-
-
-
- 712.
- Date: Tue, 14 Nov 89 10:33:56 PST
- From: brent (Brent Welch)
- Subject: -Ppulla restarted
-
- I was able to get the printer in Peter Chen's office
- going again by restarting the lpd process on sage.
- Their printer is called "pulla", by the way.
-
-
-
- 713.
- Date: Tue, 14 Nov 89 16:02:28 PST
- From: eklee (Edward K. Lee)
- Subject: possible pmake bug
-
- I was trying to run pmake in ~eklee/sim on a ds3100.
- Pmake complains about:
- #if (%(TM) == "ds3100")
- "local.mk", line 3: Warning: Malformed conditional (( %(TM) == "ds3100" ))
- but accepts:
- #if (%(TM) == "sun3")
-
-
-
- 714.
- Date: Wed, 15 Nov 89 13:42:56 PST
- From: jhh@sprite.Berkeley.EDU (John H. Hartman)
- Subject: ntalkd bug
-
- Ntalkd was in an infinite loop on hijack. It was also changing pids every
- so often. I put it into the debugger, but was unable to find an unstripped
- version of the binary, nor was I able to build a new binary.
-
-
-
-
- 715.
- Date: Wed, 15 Nov 89 11:52:33 PST
- From: gibson (Garth Gibson)
- Subject: pmake errors
-
- On basil VERSION 1.034 (sun3) (17 Oct 89 14:18:43)
- I saw these messages:
- <11>Nov 15 11:37:58 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:37:58 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:38:37 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:38:38 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:38:51 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:38:52 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:39:27 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
- <11>Nov 15 11:39:27 syslog: Db_Open: error opening file /sprite/admin/migInfo.new: permission denied.
-
- then my pmake hung with the message:
- basil 579> make
- --- sun3.md/cvscan.o ---
- rm -f sun3.md/cvscan.o
- cc -g -DNODATA -DTESTING=1 -DKERNEL=1 -L../raidlib -L../sim -g -O -msun3 -I/users/gibson/lib/include -I. -I. -Isun3.md -I../raidlib -I../sim -I../sim/sun3.md -I/sprite/src/kernel/dev -I/sprite/src/kernel/dev/sun3.md -I/sprite/src/kernel/Include -I/sprite/src/kernel/Include/sun3.md -c cvscan.c -o sun3.md/cvscan.o
- --- sun3.md/devDisk.o ---
- rm -f sun3.md/devDisk.o
- cc -g -DNODATA -DTESTING=1 -DKERNEL=1 -L../raidlib -L../sim -g -O -msun3 -I/users/gibson/lib/include -I. -I. -Isun3.md -I../raidlib -I../sim -I../sim/sun3.md -I/sprite/src/kernel/dev -I/sprite/src/kernel/dev/sun3.md -I/sprite/src/kernel/Include -I/sprite/src/kernel/Include/sun3.md -c devDisk.c -o sun3.md/devDisk.o
- make: Child (54f) not in table?
-
- ps tells me that make is in some form of infinite loop:
- USER PID %CPU %MEM SIZE RSS STATE TIME PR COMMAND
- gibson b054e 75.0 3.3 424 272 READY 3:43 make
- gibson 2050b 13.2 11.7 1648 960 READY 235:08 Xsprite :0
- gibson 4051d 2.4 7.2 616 592 READY 8:29 tx =80x34+0-0
-
- I did a Ctl-C on the make and it went into the debugger with the syslog msg:
- MachTrap: Bus error in user proc b054e, PC = f254, addr = 2e63207b BR Reg 2c020
-
- the directory I am working in is ~gibson/RAID/sim.RAID/work
-
- i reran the make, got more Db_Open permission denied messages then make
- died with:
- basil 580> make
- make: Lockfile owned by you -- ignoring it
- --- sun3.md/mult.o ---
- rm -f sun3.md/mult.o
- cc -g -DNODATA -DTESTING=1 -DKERNEL=1 -L../raidlib -L../sim -g -O -msun3 -I/users/gibson/lib/include -I. -I. -Isun3.md -I../raidlib -I../sim -I../sim/sun3.md -I/sprite/src/kernel/dev -I/sprite/src/kernel/dev/sun3.md -I/sprite/src/kernel/Include -I/sprite/src/kernel/Include/sun3.md -c mult.c -o sun3.md/mult.o
- mult.c: In function mult:
- mult.c:43: warning: assignment of pointer from integer lacks a cast
- --- sun3.md/pseudoIO.o ---
- rm -f sun3.md/pseudoIO.o
- cc -g -DNODATA -DTESTING=1 -DKERNEL=1 -L../raidlib -L../sim -g -O -msun3 -I/users/gibson/lib/include -I. -I. -Isun3.md -I../raidlib -I../sim -I../sim/sun3.md -I/sprite/src/kernel/dev -I/sprite/src/kernel/dev/sun3.md -I/sprite/src/kernel/Include -I/sprite/src/kernel/Include/sun3.md -c pseudoIO.c -o sun3.md/pseudoIO.o
-
- Segmentation violation
-
-
- MachTrap: Bus error in user proc 53d, PC = 739e, addr = 2d672035 BR Reg 20
-
- so I tried "pmake -x", got more Db_Open permission denied messages and
- another pmake: Child (d0539) not in table?
-
-
-
-
- 716.
- Date: Wed, 15 Nov 89 13:35:22 PST
- From: brent (Brent Welch)
- Subject: Proc_Lock race?
-
- Kvetching didn't quite make it out of recovery.
- I found that it was in Proc_WakeWaitingProcesses,
- stuck in Proc_Lock on an unused, unlocked process
- table entry. The condition variable was also zero,
- which means noone thought it was being waited on.
- The lock information said the process
- table entry had been last locked by some other process
- that had also gone away by this time. It looks like
- there is some race between ProcFreePCB, Proc_Lock,
- and Proc_LockID.
-
- Here is an abstract of each routine. Any ideas?
-
- Proc_Lock(pcbPtr)
- {
- LOCK_MONITOR;
- while (procPtr->genFlags & PROC_LOCKED) {
- (void) Sync_Wait(&procPtr->lockedCondition, FALSE);
- }
- procPtr->genFlags |= PROC_LOCKED;
- UNLOCK_MONITOR;
- }
-
- Proc_Unlock(procPtr)
- {
- LOCK_MONITOR;
- procPtr->genFlags &= ~PROC_LOCKED;
- Sync_Broadcast(&procPtr->lockedCondition);
- UNLOCK_MONITOR;
- }
-
- ProcFreePCB(procPtr)
- {
- LOCK_MONITOR;
- while (procPtr->genFlags & PROC_LOCKED) {
- (void) Sync_Wait(&procPtr->lockedCondition, FALSE);
- }
- procPtr->state = PROC_UNUSED;
- procPtr->genFlags = 0;
- UNLOCK_MONITOR;
- }
-
- Proc_LockPID(pid)
- Proc_PID pid;
- {
- LOCK_MONITOR;
- procPtr = proc_PCBTable[Proc_PIDToIndex(pid)];
- while (TRUE) {
- if (procPtr->state == PROC_UNUSED || procPtr->state == PROC_DEAD) {
- procPtr = (Proc_ControlBlock *) NIL;
- break;
- }
- if (procPtr->genFlags & PROC_LOCKED) {
- do {
- (void) Sync_Wait(&procPtr->lockedCondition, FALSE);
- } while (procPtr->genFlags & PROC_LOCKED);
- } else {
- if (!Proc_ComparePIDs(procPtr->processID, pid)) {
- procPtr = (Proc_ControlBlock *) NIL;
- } else {
- procPtr->genFlags |= PROC_LOCKED;
- }
- break;
- }
- }
-
- UNLOCK_MONITOR;
- return(procPtr);
- }
-
-
-
- 717.
- Date: Wed, 15 Nov 89 18:13:35 PST
- From: gibson (Garth Gibson)
- Subject: brk bug
-
- I was reading comp.os.mach and I saw this brk bug testing program (below).
- I compiled it on rosemary and ernie where it passes, but on basil and
- apathy it fails. As it fails to "free" user heap store and users do not
- often free heap store to the system, you may not care.
- garth
-
- /*
- ** From: mcm@rti.UUCP (Mike Mitchell)
- ** Subject: Mach 2.5 bug
- ** Keywords: kernel expand(), PTE's
- ** Date: 16 Nov 89 00:35:56 GMT
- ** Organization: Research Triangle Institute, RTP, NC
- **
- ** I have run into a problem with Mach 2.5. It is a problem that been in
- ** BSD 4.X until BSD 4.3-Tahoe. The fix is well understood for BSD systems,
- ** but I'm not sure how it fits into the Mach kernel.
- **
- ** The problem is that memory pages are not returned properly when using the
- ** 'brk()' library routine to free them. More specifically, the PTE entries
- ** are not invalidated properly when shrinking a region. I can supply some
- ** diffs to fix the problem for BSD systems, but I've never seen Mach source.
- **
- ** Anyway, try running the enclosed program. Please tell me if it works on
- ** your machine, and if so, what version of Mach and the type of CPU.
- **
- * This program shows off a problem with the kernel's "expand()" routine.
- */
- #include <signal.h>
-
- main()
- {
- char *old_break, *cp;
- int i;
- extern char *sbrk(), *brk();
- void segv();
-
- signal(SIGSEGV, segv);
-
- i = getpagesize();
- old_break = sbrk(0); /* get the current "break" */
- (void) brk(old_break + 2*i); /* bump it up 2 pages */
-
- cp = old_break + i + 256;
- *cp = 1; /* write into a new page */
-
- (void) brk(old_break); /* release the memory */
-
- *cp = 2; /* write into the page again. This */
- /* time, you should get a sigsegv */
-
- printf("Your brk routine is broken!\n");
- exit(1);
- }
-
- void segv()
- {
- printf("Your brk routine works correctly.\n");
- exit(0);
- }
-
- /*
- ** Mike Mitchell {decvax,seismo,ihnp4,philabs}!mcnc!rti!mcm mcm@rti.rti.org
- **
- ** "If you hear me talking on the wind, You've got
- ** to understand, We must remain perfect strangers" (919) 541-6098
- */
-
-
- 718.
- Date: Wed, 15 Nov 89 18:22:22 PST
- From: brent (Brent Welch)
- Subject: Makefile broken in /sprite/src/kernel/sprite
-
- The Makefile in /sprite/src/kernel/sprite only works
- if a TM environment variable is set. I don't ordinarily
- set this. I got the following error messages before
- I figured that I should set TM.
-
- "Makefile", line 28: Warning: Malformed conditional (!empty(TM))
- "Makefile", line 30: #if-less #else
- "Makefile", line 32: #if-less #endif
- Fatal errors encountered -- cannot continue
-
-
-
- 719.
- Date: Wed, 15 Nov 89 19:00:18 PST
- From: brent (Brent Welch)
- Subject: Failed recovery
-
- I guess I have to take back my earlier complaints about
- page faults using up all the Proc_ServerProcs such that
- recovery is prevented. Sage failed to recover after
- Allspice rebooted, and I learned something by debugging
- it. The Proc_ServerProcs are not used at all! They
- were all available. There is some other reason that
- recovery doesn't kick in, and I haven't figured it out,
- yet. Also, I didn't find anybody stuck on the Proc_Lock,
- like what happened to Kvetching. Anyway, please let
- me know if your machine doesn't make it through recovery.
- I need to take another look at it.
-
-
-
- 720.
- Date: Wed, 15 Nov 89 19:05:24 PST
- From: tve (Thorsten von Eicken)
- Subject: /sprite/admin/howto/addNewHost
-
- I'm in the process of adding buzz (a sun3), here's what I'm encountering:
-
- #2. /etc/spritehosts is checked in (RCS) by mendel. I had to override.
- #3. /tftpboot is now on mint, not on ginger
- #3. the ndboot stuff seems to be bogus. The internet-address-file link is to
- sun3.md/netBoot (at least I think)
- #3. well, the whole stuff with the devices and so is bogus, isn't it?
- #4. it seems this step HAS gone away
- #5. the fsmakedev is unclear. What's the serialB business? It is not said
- that a dev directory has to be crated in /hosts/foo, and that the
- syslog should go there.
- #7. what's this "export command for the root partition"?
- #10. /etc/hosts.equiv is checked out by jhh
- #10. what's the business of 'hostname' vs. 'hostname.Berkeley.EDU' in
- /etc/hosts.equiv?
-
- Ok, except that I can't find the netBoot for sun3's with a lance ethernet, it
- seems I got through...
- Thorsten
-
-
-
- 721.
- Date: Thu, 16 Nov 89 02:08:07 PST
- From: shirriff (Ken Shirriff)
- Subject: Allspice problems
-
- Just as I decided to go home tonight, allspice started spewing out
- consist reply errors on pride. I checked allspice and it had
- about 40 tftpd's in the debugger. I tried to debug one of the
- tftpd's from nutmeg, but nutmeg hung. I tried to debug tftpd
- from allspice, but this seemed to upsed mint, which started trying
- to do recovery with allspice and failed. Since I couldn't access
- anything from allspice, I couldn't do any debugging so I rebooted it.
- I then found that the ipServer on mint seemed to be in an infinite
- loop but I couldn't debug it because accessing /sprite/src/daemons
- needed to wait on allspice. At this point, mint was printing out
- heaps of messages. Allspice came back up and I left Bob to look
- at the ipServer.
-
-
-
- 722.
- Date: Thu, 16 Nov 89 08:25:07 PST
- From: brent (Brent Welch)
- Subject: RPC Ethernet Protocol
-
- I suspect that once again Sprite RPC is colliding
- with some other Ethernet protocol. While we changed
- our protocol number away from the XNS_IDP number (0x600),
- we now use (0x500), a nice round number that is probably
- used for some other protocol. All the messages about
- RPC version mismatch are probably due to this.
-
- The fact that the Sun4 net module doesn't recompile is
- also a problem, but cause the network interface gets
- reset after too many errors, and eventually this can
- tickle the bug where a sender gets hung. Allspice
- is still susceptible to this bug. I'll bet that's
- what happened last night. There were lots of complaints
- about bad RPC packets at oregano, and lots of trouble
- between it and allspice.
-
-
-
- 723.
- Date: Thu, 16 Nov 89 10:56:59 PST
- From: johnw (John Wawrzynek)
- Subject: TLB fault
-
- I have been experiencing the following when I use emacs rmail to
- respond to a message:
-
- Bad user TLB fault in process xxx: pc=4752e8 addr=646e6553
-
- xxx is an emacs process. Thanks.
-
-
-
- 724.
- Date: Thu, 16 Nov 89 11:28:43 PST
- From: Fred Douglis <douglis>
- Subject: Re: hung walls => pseudo-device startup bug
-
- seems like you could preserve the blocking semantics, if you think
- they're desirable, with two fixes: first, if the server exits, go
- through and find any processes blocked on the pdev; and second, make
- the open call use some sort of callback so that the open doesn't get
- hung and is instead delayed and retried. That way it would be
- interruptable. It seems like all pdev-related RPCs should really be
- done in a way that the failure of a user-level process won't hang
- another process forever. spring cleaning item, maybe?
- Fred
-
-
- 725.
- Date: Thu, 16 Nov 89 11:11:58 PST
- From: ouster (John Ousterhout)
- Subject: Trashed mail file
-
- My mail inbox (/sprite/spool/mail/ouster) got trashed again today,
- but the symptoms lead me to believe it's sendmail that's doing the
- trashing. There are two messages in the mailbox where exactly one
- line (or perhaps less than a line?) got messed up. Here is the raw
- text from the inbox:
-
- From douglis Thu Nov 16 10:25:35 1989
- Received: from garnet.Berkeley.EDU by sprite.Berkeley.EDU (5.59/1.29)
- id AA663622; Thu, 16 Nov 89 10:25:33 PST
- Received: by garnet.berkeley.edu (5.57/1.32)
- id AA25948; Thu, 16 Nov 89 08:50:54 PST
- Date: Thu, 16 Nov 89 08:50:54 PST
- From: c60b2-am@garnet.berkeley.edu (Kevin Gong)
- Message-Id: <8911161650.AA25948@garnet.berkeley.edu>
- To: c60b2-am@garnet.berkeley.edu, ouster@sprite.Berkeley.EDU
- Subject: Re: "value" vs. "machineCode"
-
- Well, it's in the homework description, but it's also in the skeleton
- for one or more of the programs (classify.c, and/or findIns.c) in
- the comments.
-
- - kevin
-
- From douglis Thu Nov 16 10:31:35 1989
- Received: from janus.Berkeley.EDU by sprite.Berkeley.EDU (5.59/1.29)
- id AA401482; Thu, 16 Nov 89 10:31:31 PST
- Received: by janus.Berkeley.EDU (5.57/1.34)
- id AA00686; Thu, 16 Nov 89 08:33:54 PST
- Date: Thu, 16 Nov 89 08:33:54 PST
- From: ilp@janus.Berkeley.EDU (Shelley Sprandel)
- Message-Id: <8911161633.AA00686@janus.Berkeley.EDU>
- To: ouster@sprite.Berkeley.EDU
- Subject: ILP meeting & software
- Cc: ilp@janus.Berkeley.EDU, neureuth@esvax.berkeley.edu
-
- I talked with Andy Neureuther about having Cindy at the meeting. He feels
- it might be better not to have her there. I'll have copies of the list
- of faculty responses she's received. Andy also wants to know the status
- of several things: the Commerce Dept. GTDAs and the draft of the license
- to companies who want to use the software commercially.
-
- Unfortunately, Cindy has very bad carpal tunnel syndrome problems,
- has two doctor's appointments today, and won't be in. I'll try calling
- her at home. She comes in at 7:00 usually, so we should have the
- information in time for the meeting.
- -Shelley
-
- Notice that the messages appear to be perfectly well-formed except that
- the first "From" line in each message lists Fred as the sender instead of
- the real sender. These messages were consecutive in the mailbox. The
- cleanliness of the substitution makes me think it isn't a random
- file-system error that's doing it, but rather something in the mailer.
- I've saved a copy of the whole mailbox in ~ouster/mail.bad in case
- anyone wants to look at the bits in more detail.
-
- By the way, Fred, I suspect that two message from you were lost. Can
- you resend them?
- -John-
-
-
- 726.
- Date: Thu, 16 Nov 89 11:15:51 PST
- From: Fred Douglis <douglis>
- Subject: Re: Trashed mail file
-
- mint was acting up before and i had to restart the ipServer and
- associated daemons. but before that, i found that a bunch of daemons
- weren't running, and i started sendmail by hand. i also ran "sendmail
- -q" to process the mail queue. for some reason, mail delivered by
- that sendmail run came out as "From douglis" for both you and mary.
- sendmail is setuid, so i don't know why that would be.
-
-
-
- 727.
- Date: Thu, 16 Nov 89 11:22:52 PST
- From: Fred Douglis <douglis>
- Subject: uwm bug
-
- the new uwm in X11R3 apparently doesn't pass environment variables
- properly. I can't start programs from within uwm unless I specify a
- display on the command line. /X/cmds.ds3100/uwm works fine. I can
- start programs from my shell just fine.
-
-
-
- 728.
- Date: Thu, 16 Nov 89 11:24:01 PST
- From: brent (Brent Welch)
- Subject: Re: hung walls => pseudo-device startup bug
-
- Apparently I need to fix the pseudo-device implementation
- so open attempts by clients are denied, not blocked,
- if the server process hasn't fully started up. Currently
- there is some situation where rlogind creates
- a ``/hosts/foo/rloginN'' pseudo-device, forks a child,
- and exits without finishing its startup duties as
- a pseudo-device server. The child process hangs,
- and subsequent wall processes also hang, because they
- too are clients of the pseudo-device.
-
-
-
- 729.
- Date: Thu, 16 Nov 89 01:35:07 -0800
- From: tve@ernie.Berkeley.EDU (Thorsten Von Eicken)
- Subject: Re: hung walls => pseudo-device startup bug
-
- I just looked at allpice's syslog:
- The world seems to be in endless recovey loops. Crackle thinks
- allspice is recovering every 30 seconds or so. Allpice has messages
- about mint and oregano recovering all the time.
-
-
-
- 730.
- Date: Thu, 16 Nov 89 14:03:40 PST
- From: Fred Douglis <douglis>
- Subject: trashed file
-
- I found a file with a bunch of nulls in it. Since the file is updated
- every 5 minutes, I can put a bound on when the problem occurred: after
- yesterday at 12:40 pm, and probably before yesterday at 1:25 pm.
- Assault did not reboot at that time or since.
-
- I put the file in /user2/BADFILES/mig-usage; it's a couple of
- megabytes so if no one is interested in it then it should be deleted.
-
-
-
- 731.
- Date: Thu, 16 Nov 89 14:38:31 PST
- From: douglis (Fred Douglis)
- Subject: recovery killed X
-
- after kvetching recovered, the X server just kept printing out
- "WaitForSomething() errno=22" over and over.
-
-
-
- 732.
- Date: Thu, 16 Nov 89 22:00:47 PST
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: mint hit consistency deadlock again
-
- mint wedged with the good old problem where lots of rpc servers backed up
- on a consistency-in-progress flag for host 18, whichever that is (is spritehosts
- stored on unix anywhere??).... when i tried to continue mint to see what
- i might learn, it died because as usual i forgot to say "pid 0" before
- continuing it.
-
-
-
- 733.
- Date: Thu, 16 Nov 89 21:02:35 PST
- From: david@rosemary.Berkeley.EDU (David A. Wood)
- Subject: Unstoppable pmake??
-
- I have been having some problems with pmake getting in an unkillable
- state tonight. The process 'WAIT's right at the beginning. Perhaps
- for a filesystem?? In any case, it does not respond to a ^C or ^Z,
- nor can I kill it with kill -KILL.
- The system is fine; I can rlogin again, but I can't get any work done.
-
-
-
- 734.
- Date: Thu, 16 Nov 89 22:54:50 PST
- From: shirriff (Ken Shirriff)
- Subject: frexp?
-
- ldexp(...) defined in gnulib/ds3100.md/ldexp.c calls frexp(...) which
- doesn't seem to be defined anywhere for the ds3100, which means my
- compiles bomb with Undefined: frexp.
- (ldexp and frexp are defined in include/sun3.md for the sun3.)
-
-
-
-
- 735.
- Date: Fri, 17 Nov 89 08:49:18 PST
- From: brent (Brent Welch)
- Subject: trashed stat files
-
- I made another pass through all my data files and turned up
- a number of trashed ones. They all had the first 2 or 3
- Kbytes zeroed out, which points to a fragment bug. I've
- been saving these in files named 'nuked.08:05:01.Z' (with
- the appropriate date stamp) if their original was 'rawstat.08:05:01.Z'.
- I'm leaving them in their original directory like this
- so I can get an idea of when they get trashed vs. when they
- were created. My current thoughts are that they get trashed
- shortly after they are generated, probably in the delayed write
- logic. I suspect that their cache block is being re-used too
- soon, or something. All of these files are under 3K, and either
- the first 2K are zero, or the whole thing is null. There is
- some slight-of-hand done when a fragmented file has to be
- written out because it has to be realigned, and I'll be that's
- broken.
-
-
-
- 736.
- Date: Fri, 17 Nov 89 15:32:42 PST
- From: ouster (John Ousterhout)
- Subject: Fred is sending a lot of mail
-
- All the mail I've received in about the last hour has come out
- with Fred as the sender (same probably as as day or two ago, except
- it isn't going away).
-
-
-
- 737.
- Date: Fri, 17 Nov 89 15:50:19 PST
- From: Fred Douglis <douglis>
- Subject: Re: Fred is sending a lot of mail
-
- I think someone must have restarted mint's ipServer by hand (it wasn't
- a low process ID such as would be the case if it were the ipServer
- that started at boot-time). No sendmail background daemon was around.
- I started it as myself. Why sendmail persists in putting
-
- >From douglis
-
- at the start of each message is beyond me. I'll restart it as root
- and see how it goes.
-
-
-
-
- 738.
- Date: Sat, 18 Nov 89 20:52:22 PST
- From: tve (Thorsten von Eicken)
- Subject: /sprite/admin/responsibilities
-
- not many people have confessed up to now....
-
-
-
- 739.
- Date: Sun, 19 Nov 89 01:11:13 PST
- From: tve (Thorsten von Eicken)
- Subject: need tsort program
-
- ... can't find it on Sprite ...
-
-
-
- 740.
- Date: Sun, 19 Nov 89 02:12:37 PST
- From: tve (Thorsten von Eicken)
- Subject: pmake problem
-
- I have problems with a traditional makefiles (for make, not pmake).
- These makefiles (over 200!) are in the octtools distribution which I'm trying
- to compile on sprite.
-
- The problem is that the makefiles tend to construct very long lines to
- circumvent the problem that every script line runs in it's own shell. For
- example:
- > cleaninstall:
- > @echo "# %(MAKE) %@"
- > @for x in `%{MAKEORDER} %{TOOLS}`; do \
- > echo "cd %$x ; %(MAKE) install clean" ; cd %$x ; \
- > %(MAKE) %(MFLAGS) %(MAKEVARS) install clean ;\
- > echo "cd .. # done in %$x (%@)" ; cd .. ; \
- > done
- What happens is that the very long line (started by "for x..." in this
- example) get truncated *silently* somewhere. Having tried several things,
- I suspect that it's pmake who clips the line. Could you please check and
- fix pmake? To test: "cd /mic/octtools/common/src; make cleaninstall".
- You should get a "/bin/sh: syntax error at line 1: `end of file' unexpected".
- Thorsten
-
-
- 741.
- Date: Mon, 20 Nov 89 11:01:26 PST
- From: pmchen (Peter M. Chen)
- Subject: lost mail
-
- I recently found out about some mail that did not get delivered to sprite.
- In the past, sprite has updated my /sprite/spool/mail/pmchen file
- incorrectly (mail is dropped in the middle of the file, etc.). I believe the
- mail was send from touati@arpa (Herve, let me know if I'm wrong on that).
-
- I think handling mail incorrectly will be strong impetus to use unix, at
- least to send and receive mail.
-
-
-
- 742.
- Date: Mon, 20 Nov 89 13:02:18 PST
- From: mendel (Mendel Rosenblum)
- Subject: Fs_SetAttributes bug
-
-
- Doing a RCS "ci" on a symbolic link creates a file which ls thinks is a
- symbolic link.
-
- The problem occurs because Fs_SetAttributes doesn't check to see that
- the permission field contains a valid permission value. When "ci" chmod's the
- new RCS file it somehow ends with the permission bits of 0xffffa16d rather
- than 0x16d. This causes Fs_SetAttributes to set the permission field of
- the file descriptor to 0xffffa16d. When ls does a stat() on this file
- the combatibility library does:
-
- unixAttsPtr->st_mode = spriteAttsPtr->permissions |
- CvtSpriteToUnixType(spriteAttsPtr->type);
-
- The extra bits in the spriteAttsPtr->permissions field now are or'ed into the
- type field causes the type field to become S_IFLNK.
-
- I've fixed the compatiblity library to handle bogus spriteAttsPtr->permissions
- fields. Someone should patch the hole in the Fs_SetAttributes syscall.
-
-
-
- 743.
- Date: Mon, 20 Nov 89 13:34:53 PST
- From: mendel (Mendel Rosenblum)
- Subject: Files not being dumped.
-
- If you create a directory structure such that the full pathnames of the files
- are more than 100 characters then the files will not be dumped by the
- Sprite dump program. The following files were not dumped by the last
- dump:
-
- /mic/octtools/common/lib/technology/scmos/msu/s150/mag/cs250_pads/SPUR_PADS/OCT_PADS/hgp/physical/contents;
-
- and
-
- /mic/octtools/common/lib/technology/scmos/msu/s150/mag/cs250_pads/SPUR_PADS/OCT_PADS/hgp/physical/interface;
-
-
-
- 744.
- Date: Mon, 20 Nov 89 13:42:32 PST
- From: Fred Douglis <douglis>
- Subject: another dump bug report
-
- the error messages mendel saw from the dump program were not mailed to
- me when I got the following message:
-
- >>>>> On Mon, 20 Nov 89 13:32:27 PST, root@sprite.Berkeley.EDU (The Sprite God) said:
-
- root> To: douglis
- root> Dump completed successfully.
-
- root> Level 1 dump on Mon Nov 20 11:46:26 1989
- root> /user1
- root> /user2
- root> /sprite
- root> /sprite/src
- root> /sprite/src/kernel
- root> /mic
- root> /b
- root> /c
- root> /
-
- which means we could be hitting errors that Bob (or whoever is doing
- the dumps) never finds out about.
-
- Also, the dumps are not getting run automatically from murder's
- crontab -- I've had to do them by hand. And, the tape that's marked
- for yesterday's dump (11/19) ran okay at first, but when the dump
- program exited after the rlogin connection that had started it died, i
- got file mark errors on that tape and had to move on to the next tape.
-
-
-
-
- 745.
- Date: Mon, 20 Nov 89 16:49:43 PST
- From: Fred Douglis <douglis>
- Subject: ds3100 X server / keyboard problem
-
- after I tear down X, when typing at the console, I get
- a lot of bouncing -- many characters are echoed (and input)
- twice, especially if the shift key is held.
-
-
-
-
- 746.
- Date: Mon, 20 Nov 89 21:09:47 PST
- From: brent (Brent Welch)
- Subject: awk loop on sun4
-
- Awk went into an infinite loop on anise.
- The same thing works ok on a sun3.
- To repeat:
- cd ~brent/postrawstat/Results.cache
- awk -f AwkCacheClt mustard.sun3.jul-nov.var-
-
-
-
-
- 747.
- Date: Tue, 21 Nov 89 03:14:58 PST
- From: eklee (Edward K. Lee)
- Subject: files slaughtered
-
- Today, many of my files disappeared from sprite. Most of the
- missing files seem to be binaries but I'm not sure of that
- yet.
- The director ~eklee/cmds.md disappeared altogether. This
- is the second time that this particular directory has
- disappeared.
-
- Bob, I would appreciate it greatly if you could restore
- ~eklee/cmds.md as soon as possible (I need it to start up
- X).
-
- thanks,
-
-
-
- 748.
- Date: Tue, 21 Nov 89 10:02:27 PST
- From: Fred Douglis <douglis>
- Subject: *.fsc files
-
- The location of these files keeps changing, so they're hard to find.
- For example, there are *.fsc files dated July in /mintA/boot, and it
- wasn't until Mendel suggested that I look in /hosts/mint that I found
- the current ones. The thing is, they print "checking /dev/..."
- without any indication of the date, so correlating error messages with
- boottimes is hard.
-
- By the way... Ed has lots of files in /sprite/lost+found, but nothing
- all that recent, and mint rebooted a few days ago -- not in the past
- day. I'm still interested in hearing when the last time is that Ed's
- sure the directory and files did exist okay.
-
-
-
-
- 749.
- Date: Wed, 22 Nov 89 10:17:50 PST
- From: mendel (Mendel Rosenblum)
- Subject: sun4 register trash bug: low priority
-
- The sun4 window underflow handler trashes some user accessible registers it
- probably should not. For example, the following routine returns 1 on SunOS
- and some value like 503315628 on Sprite.
-
- .globl _foo
- _foo:
- save %sp, -96, %sp
- call CallDeepEnoughtToFlushWindows
- nop
- mov 1,%o1
- ret
- restore %o1,%g0,%o0
-
- The problem occurs when the restore causes a window underflow. The underflow
- handler trashes the %o1 register which is used when the restore is reissued.
- This is not a high priority problem because the C compiler never generates
- code using restore in this way. The library routine longjmp() does use
- restore in this way and so longjmp(jmp_buf,1) causes the setjmp() to return
- with 503315628 rather than 1. If this becomes a problem we can probably make
- long jump a few instructions longer and get around the problem.
-
-
-
- 750.
- Date: Wed, 22 Nov 89 10:20:11 PST
- From: Fred Douglis <douglis>
- Subject: dumps going from bad to worse
-
- I changed crontab to pipe the output of dailydump into "Mail douglis"
- and got the following message at the time it was run. I think the
- problem may be with cron rather than dumps; murder's ipserver died as
- i tried to investigate further so I can't say for sure yet. But
- here's the note:
-
- ------- Forwarded Message
-
- Date: Wed, 22 Nov 89 02:00:07 -0800
- From: root@sprite.Berkeley.EDU (The Sprite God)
- To: douglis@sprite.Berkeley.EDU
-
-
- lost+found
- spriteCory
- .Xdefaults
-
-
-
-
- ------- End of Forwarded Message
-
-
- At the same time that the ipServer died, the dumps started up (from
- crontab again, as I was debugging it) -- but died a moment later with
- a "catastrophic formatting error" from the exabyte. After popping the
- tape out and putting it back in (to make sure the tape had rewound), I
- couldn't get a green light from the exabyte. We finally power-cycled
- the exabyte and it came back.
-
-
-
-
- 751.
- Date: Wed, 22 Nov 89 14:21:34 PST
- From: Fred Douglis <douglis>
- Subject: dist file prot bug: migInfo
-
- Mike had trouble getting migration to kick in down at WRL because
- /sprite/admin/migInfo had the wrong permissions. Someone complained
- that our copy up here temporarily had the wrong permissions too. In
- case the distribution isn't already set up to create this file with
- mode 0666, I figured I'd report this.
-
-
-
-
- 752.
- Date: Wed, 22 Nov 89 16:46:17 PST
- From: shirriff (Ken Shirriff)
- Subject: Tx bug
-
- I grep'd through a file that wasn't ascii and my tx window went into
- an infinite loop. I couldn't figure out what was wrong before dbx
- decided to complain about Illegal Instructions, so this will probably
- have to be filed away until it reoccurs. The problem seems to be in
- Sx_Notify line 291, where it is trying to figure out the notifier
- size by calling EndOfLine to take chunks of the line. EndOfLine
- is stepping through the string, but somehow Sx_Notify keeps starting
- over and processing the same string.
-
-
-
- 753.
- Date: Fri, 24 Nov 89 15:19:10 PST
- From: tve (Thorsten von Eicken)
- Subject: mailbox corrupted
-
- This time it's my mailbox which got affected. Fred's message
- about RCS'ed systemfiles landed in the middle of one of the
- messages I left in my mailbox.
-
-
-
- 754.
- Date: Sat, 25 Nov 89 11:12:56 PST
- From: pmchen (Peter M. Chen)
- Subject: mint is somewhat hosed
-
- Getting lots of
- <reopen> 11/25/89 11:11:51 mint (32) RPC timed-out
- 11/25/89 11:11:51 mint (32) Recovery failedrpc timeout
-
- Am unable to get to command user commands on clients.
-
-
-
- 755.
- Date: Sat, 25 Nov 89 11:43:17 PST
- From: shirriff (Ken Shirriff)
- Subject: Mail got trashed
-
- I got two mail messages merged together into one:
-
- Message 118:
- >From netlibd@surfer.EPM.ORNL.GOV Fri Nov 24 23:50:33 1989
- Date: Sat, 25 Nov 89 02:50:10 -0500
- From: netlibd@surfer.EPM.ORNL.GOV (Netlib)
- To: shirriff@sprite.Berkeley.EDU
- Subject: send linpackc from bench
-
- Sorry, no such library is available. Recheck the general index.
- Here are some example requests, in case syntax is the problem:
-
- send index
- send index for eispack
- send rg from eispack
- who is eric grosse
-
- Received: by sprite.Berkeley.EDU (5.59/1.29)
- id AA335964; Sat, 25 Nov 89 11:12:56 PST
- Date: Sat, 25 Nov 89 11:12:56 PST
- From: pmchen (Peter M. Chen)
- Message-Id: <8911251912.AA335964@sprite.Berkeley.EDU>
- To: bugs
- Subject: mint is somewhat hosed
-
- Getting lots of
- <reopen> 11/25/89 11:11:51 mint (32) RPC timed-out
- 11/25/89 11:11:51 mint (32) Recovery failedrpc timeout
-
- Am unable to get to command user commands on clients.
-
-
-
- 756.
- Date: Sun, 26 Nov 89 15:10:05 PST
- From: ouster (John Ousterhout)
- Subject: Allspice reboot
-
- I rebooted Allspice this afternoon. It was refusing to talk to Mace,
- even after I L1-N'ed it to reset its network interface and pinged Mace
- from Allspice. Allspice did seem to talk to just about everyone else,
- and strangely enough the act of preparing it to reboot managed to clear
- up the condition with Mace (I went back to my office halfway through
- the Allspice boot cycle and discovered that Mace was no longer hanging).
-
-
-
-
- 757.
- Date: Sun, 26 Nov 89 16:15:22 PST
- From: Fred Douglis <douglis>
- Subject: ds3100 duplicate memory free panic
-
- kvetching died sometime this morning with a message about
- freeing a block that was already free, but i was unable to
- attach to it to debug the corpse. this is just for the record, to
- see if it's a fluke or the start of a trend.
-
-
-
-
- 758.
- Date: Mon, 27 Nov 89 10:33:11 PST
- From: mendel (Mendel Rosenblum)
- Subject: missing man pages from dump and restore
-
- >From /sprite/admin/howto/restoreAFile:
-
- > 6. For more information see the manual entries for `dump' and `restore'.
-
- murder% man restore
- No manual entry for "restore".
- murder% man dump
- No manual entry for "dump".
-
-
-
-
- 759.
- Date: Mon, 27 Nov 89 13:17:28 PST
- From: Fred Douglis <douglis>
- Subject: inetd/login problem explained
-
- george taylor was told to run "/hosts/hijack/restartservers", which is
- a setuid shell script that starts up various daemons. that explains
- why the real userid was gibson (taylor didn't exist until just now)
- and why i never had any trouble suing and then restarting daemons.
- making it a setuid shell script also means that when sendmail is
- restarted it will probably think it was run by a mere mortal and would
- post mail is if it were "From <user>" instead of the real person
- sending the mail.
-
- In other words, mere mortals shouldn't have to restart servers
- themselves, but if they have to, it must be done with the real userID
- set to root.
-
-
-
- 760.
- Date: Wed, 29 Nov 89 11:36:48 PST
- From: brent (Brent Welch)
- Subject: cross-loading
-
- Earlier I reported:
- ld: Bad machine type, not M_SPARC, for /usr/lib/libnet.a(Net_EtherAddrTo)
- when trying to make a sun4 kernel on sloth.
-
- This is because /usr/lib/libnet.a is a symbolic link to
- /sprite/lib/%MACHINE.md/libnet.a. This means that somehow
- only libc is special cased to work for cross-compiliation
- (cross-loading, actually.) Do we know this? Do we like it?
-
-
-
- 761.
- Date: Thu, 30 Nov 89 09:30:49 PST
- From: culler (David Culler)
- Subject: Dare I say, Ere SOSP
-
- I've encountered a couple of strange things on Sprite recently.
-
- (1) I sometimes lose typeout. It just stops echoing characters, although
- output from programs is displayed. This happens after I logout. It also
- seems to happen after running talk. In the second case, exiting the tx
- window and firing up a new one fixed it. In the other situation I had
- to reboot.
-
- (2) When an 'rsh' command is performed (I do this to print from Fennel)
- I get a message: "ioctl: Operation not supported on socket". The remote
- command does seem to take place, however.
-
- (3) The above situation arises because I can no longer get to lw533
- from sprite. For awhile I could. Now lpq says, "Warning lw533 is down:
- sending to shallot". Unfortunately, nothing ever gets to shallot.
-
- (4) If I run dvi2ps on my machine and try to print the ps file,
- it looks rather impressionistic. Lots of interesting boxes, but few
- characters. Filtering the same dvi file through dvi2ps on fennel works
- fine.
-
- btw. Emacs still gets upset in trying to write files on Unix hosts.
-
-
-
- 762.
- Date: Thu, 30 Nov 89 10:36:35 PST
- From: ouster (John Ousterhout)
- Subject: fsattach man page
-
- This man page is a bit out-of-date. For example, it refers to
- "/local" in a few places.
-
-
-
- 763.
- Date: Thu, 30 Nov 89 13:51:55 PST
- From: mgbaker (Mary Gray Baker)
- Subject: cc1.68k
-
- cc1.68k goes into the debugger when run on a sparcstation trying to compile
- vmBoot.c for the sun3.
-
-
-
- 764.
- Date: Thu, 30 Nov 89 19:36:51 PST
- From: mgbaker (Mary Gray Baker)
- Subject: C library hash routine quite broken
-
- For Hash_CreateEntry, the test to see if an entry existed already was
- backwards. It should have been "if (!bcmp(...))" but was instead
- "if (bcmp(...))".
-
- What uses the C library hash routines? I know the kernel doesn't.
-
-
-
- 765.
- Date: Thu, 30 Nov 89 19:54:25 PST
- From: mendel (Mendel Rosenblum)
- Subject: Re: gdb on sun3
-
- > gdb reports the stack as:
-
- > #0 0xe380 in Sig_Send ()
- > #1 0x1b in ?? ()
- > (gdb)
-
- > and I can't even see the stack frame of the main() routine.
-
- Try typing "si" and things will look better. Gdb is having trouble
- backtracing the stack after the Sig_Send syscall. The si causes it
- to execute the "addql #4,sp" instruction after the "trap #1" and put
- the stack in a format gdb can backtrace.
-
-
-
-
- 766.
- Date: Fri, 01 Dec 89 11:27:39 PST
- From: Fred Douglis <douglis>
- Subject: xhost
-
- % ls -l /X11R3/cmds.ds3100/xhost
- -rwxrwxr-x 1 stolcke 44 Nov 30 12:15 /X11R3/cmds.ds3100/xhost*
- % cat /X11R3/cmds.ds3100/xhost
- echo Access control buggy--no action taken.
-
- but if i try to run an x application on another host, i get an error,
- perhaps because that host (treason) isn't in /etc/X0.hosts.
-
-
-
- 767.
- Date: Sat, 2 Dec 89 13:30:28 PST
- From: shirriff (Ken Shirriff)
- Subject: Makefile for bib
-
- If I do a "make install" on bib, it puts the new bib in
- /users/shirriff/cmds.ds3100/bib instead of /sprite/cmds.ds3100/bib.
- Is there any reason for this? I moved the previous bib to
- /sprite/cmds.3100/bib.old and installed the new one myself, since the
- old installed bib hangs if it can't find a reference.
-
-
-
-
- 768.
- Date: Sat, 2 Dec 89 17:46:26 PST
- From: mendel (Mendel Rosenblum)
- Subject: sparcStation out-of-PMEGs bug
-
- Jaywalk hung on me when I ran a program that generated a large file. The
- reason it hung was it allocated almost all its PMEGS to the kernel and
- file system cache. This is the same problem we saw on allspice. Some time
- we need to patch the VM/filesystem not to wire down the PMEGs mapping the
- file cache. Until then we should limit the size of the file system cache
- on the sparc stations.
-
-
-
-
- 769.
- Date: Wed, 6 Dec 89 19:08:17 PST
- From: mendel (Mendel Rosenblum)
- Subject: sun3, sun4 allow *(char *)(-1)
-
- Both the sun3 and sun4 allow a user program to read a byte from the address
- 0xffffffff without an error. This is not true of the sun4c.
-
-
-
-
- 770.
- Date: Thu, 7 Dec 89 00:56:05 PST
- From: tve (Thorsten von Eicken)
- Subject: HELP mail seems flaky
-
- I know John Wawrzynek lost mail (he told me). I have a curiously empty
- mailbox, but I don't know whether I actually lost anything. Mint had
- tons of sendmail error messages on its console when we tried fixing it
- this afternoon from the out-of-processes state. Could someone please
- have a careful & thorough look into it? Is there any way to recover, or
- at least to get a list of the senders of lost messages? I do think this
- is important, some people are getting upset.
-
-
-
-
- 771.
- Date: Thu, 7 Dec 89 12:51:15 PST
- From: pmchen (Peter M. Chen)
- Subject: hard to send mail from arpa to sprite
-
- Herve Touati has consistently had problems in sending mail from
- ucbarpa to sprite. So far mail has been: 1) appended into the middle
- of my /sprite/spool/mail/pmchen file, 2) dropped totally, and 3) deferred:
- bad file number. He resent it:
-
-
-
-
- 772.
- Date: Thu, 7 Dec 89 16:17:29 PST
- From: shirriff (Ken Shirriff)
- Subject: gremlin bug
-
- If you start up gremlin "gremlin foo.grn" (where foo.grn is a gremlin file)
- and then hit undo inside gremlin, gremlin seg. faults on a sun3.
-
-
-
- 773.
- Date: Thu, 07 Dec 89 20:47:20 PST
- From: Fred Douglis <douglis>
- Subject: more sendmail problems
-
- mint's sendmail existed but was refusing connections since sometime
- around 5 or 6 today. mint's ipserver pid implies that perhaps it was
- restarted by hand. anyone know anything about this? anyway, i
- started a new sendmail. i can't debug the old sendmail since there's
- no unstripped binary -- i'll try to install a new one.
-
-
-
- 774.
- Date: Sat, 9 Dec 89 01:21:00 PST
- From: elm (ethan miller)
- Subject: problems with variables in Mail
-
- There are a bunch of variables, both set in .mailrc and environment,
- that seem not to show up in Mail. Among these are tabstr, prompt,
- and MBOX. The last, especially, creates a bit of a problem. This
- is occuring on a ds3100. No crashes, just a lack of some variables
- taking effect (they show up as set, but don't do anything). Does
- anyone know why this might be?
-
-
-
-
- 775.
- Date: Sun, 10 Dec 89 08:25:35 PST
- From: tve (Thorsten von Eicken)
- Subject: ntalkd doesn't link on sun4s
-
- --- sun4.md/ntalkd ---
- rm -f sun4.md/ntalkd
- cc -g -O -msun4 -Dsprite -Dsun4 -I. -Isun4.md -o sun4.md/ntalkd sun4.md/announce.o sun4.md/print.o sun4.md/process.o sun4.md/table.o sun4.md/talkd.o
- process.c:234: Undefined symbol _Ulog_GetAllLogins referenced from text segment
-
-
-
- 776.
- Date: Sun, 10 Dec 89 12:31:08 PST
- From: mendel (Mendel Rosenblum)
- Subject: Processes in NEW state on spacstations
-
- Sparcstations seem to be collecting processes with a state of NEW. For
- example from jaywalk:
-
- jaywalk% ps -a | grep NEW
- 71223 NEW 0:00 sh -c /c/stats/RAW
- a1222 NEW 0:00 sh -c /c/stats/RAW
- 11225 NEW 0:00 mkdir jaywalk/10Dec
- 41224 NEW 0:00 test ! -d jaywalk/10Dec
- 2122d NEW 0:00 test 5 != 0
-
-
-
-
- 777.
- Date: Sun, 10 Dec 89 19:02:06 PST
- From: mendel (Mendel Rosenblum)
- Subject: sparcstation watchdog reset
-
- When I try to attach an Xsprite that was in the debugger on jaywalk the machine
- got a watchdog reset. It looks like there is a bug in the code that
- handles window underflow traps with bad stack pointers. It appears
- to do the wrong thing when Proc_SuspendProcess returns.
-
-
-
-
- 778.
- Date: Sun, 10 Dec 89 21:32:31 PST
- From: mgbaker (Mary Gray Baker)
- Subject: Re: sparcstation watchdog reset
-
- The code in the window underflow stuff isn't supposed to handle anything
- further if the process has a bad stack pointer. That's why I was originally
- calling ProcExitInt on those processes, to make sure nothing returned into the
- window underflow handler at that point. I switched to Proc_SuspendProcess so
- that we might be able to debug processes with bad stack pointers due to a
- suggestion from an optimist that Proc_SuspendProcess doesn't return. It causes
- a context switch, but I guess I'm confused about what happens when attaching to
- a process on the debug list. If it causes it to return from
- Proc_SuspendProcess into the underflow handler, then all hell will break loose
- and I will indeed need to do something more complicated than just calling
- Proc_SuspendProcess. I have 2 ideas of what to do, and I'll work on it.
-
-
-
-
-
- 779.
- Date: Mon, 11 Dec 89 16:49:49 PST
- From: pmchen (Peter M. Chen)
- Subject: lpr queuing again
-
- When I issue a printer job after not printing anything for a while, it
- gets stuck in the "sending to coriander" stage. To fix it, I can
- lprm the job and resend it, which usually works (sometimes I need to
- do it multiple times). This happens consistently.
-
-
-
- 780.
- Date: Tue, 12 Dec 89 12:22:05 PST
- From: douglis (Fred Douglis)
- Subject: restore failed!
-
- it finally reached /b after some intolerable period of time, and
- immediately went into the debugger -- perhaps because it tried to
- restore lost+found, which existed? there was no error message, just
- a statement that it was in the debugger.
-
-
-
-
- 781.
- Date: Tue, 12 Dec 89 12:52:14 PST
- From: mendel (Mendel Rosenblum)
- Subject: restore calls abort() during reload
-
- The code in tar for the "-n" flag is not documented in man page,
- not listed in the "tar -help" list, and doesn't appear to work correctly.
- It seems to causes tar to delete all the
- files in a directory that are not on the dump tape. (Is this a good idea?)
- After doing the deletes it calls the routine usrrec() to skip over
- the tar records for the directory. This manages to mess up and call abort().
-
-
-
-
- 782.
- Date: Wed, 13 Dec 89 08:12:07 PST
- From: brent (Brent Welch)
- Subject: Long I/O waits
-
- Peter sent me the following two interesting messages about
- I/O behavior on Sprite.
-
- Date: Tue, 12 Dec 89 13:58:27 PST
- From: pmchen (Peter M. Chen)
- To: brent
- Subject: diff hangs
- Status: RO
-
- I was doing a diff of some VERY LARGE files (80 MB) and diff hung
- (didn't respond to ctrl-Z or ctrl-C). I also can't seem to kill the
- process (it's in the READY state). This has happened before. The
- files were /scratch/pmchen/db2.trace.11.20 and /scratch/db2.trace.11.20.
- Any ideas?
-
- Pete
-
- Date: Tue, 12 Dec 89 14:52:40 PST
- From: pmchen (Peter M. Chen)
- To: brent
- Subject: diff hanging
-
- The problem is repeatable. However, the process doesn't hang indefinitely,
- just about 5 minutes after which it returns.
-
- Pete
-
- It appears as if the diff process got on the end of a long I/O queue.
- Perhaps some other activity at the file server clogged up the disk.
-
-
-
- 783.
- Date: Wed, 13 Dec 89 16:58:26 PST
- From: Fred Douglis <douglis>
- Subject: can't mount /scratch2
-
- I don't know how to mount /scratch2. It didn't come up automatically
- even though /hosts/anise/mount exists. running fsattach complained it
- didn't know where "mount" was. running it with an option to specify
- /hosts/anise/mount caused it to check the disk fine but complain it
- didn't know anything about /bootTmp. Seems like something funny is
- going on regarding anise not being set up with /bootTmp the way other
- machines are. I didn't see anything in the fsattach man page to
- explain it.
-
-
-
- 784.
- Date: Wed, 13 Dec 89 22:37:03 PST
- From: tve (Thorsten von Eicken)
- Subject: problem withmig on sun4s
-
- [crackle pmake] mig
- Error execing program: unknown error (0)
-
- Also, pmake doesn't actually seem to migrate anything?
-
-
-
-
-
- 785.
- Date: Thu, 14 Dec 89 11:49:58 PST
- From: Fred Douglis <douglis>
- Subject: /usr/lib
-
- why does ld look in /usr/lib instead of /sprite/lib/%TM.md? this came
- up in an earlier bug report about /usr/lib being a link and still is a
- problem.
-
-
-
-
- 786.
- Date: Thu, 14 Dec 89 12:46:20 PST
- From: Fred Douglis <douglis>
- Subject: mkmf incompatibilities control needed
-
- we need a way for Makefiles to check some sort of version number in
- the system makefiles they include. thorsten's problem, i believe, was
- due to Makefile not defining TM while script.mk expected it to be
- defined. perhaps this should be a spring-cleaning item?
-
-
-
-
- 787.
- Date: Thu, 14 Dec 89 12:48:17 PST
- From: brent (Brent Welch)
- Subject: double insert cache bug found
-
- I've finally found something wrong with the cache. Ironically it
- was my mousetrap routine that uncovered it, but not the way it
- was supposed to. The original mousetrap is in the blockWrite
- routine. If looks through the block index map to make sure
- the block seems like it belonged where it was being written.
- This causes extra uses of the indirect blocks, and this
- in turn exposed a "double insert" bug in Fscache_FetchBlock.
-
- If a cache block isn't found (say an indirect block), then Fscache_FetchBlock
- takes a block off the LRU list. This too might fail if the cache
- is full of dirty or in-use blocks. In this case Fscache_FetchBlock waits
- for room in the cache. The bug was that Fscache_FetchBlock didn't look in
- the hash table after it waited. (It only re-hashed if it first found
- the block but it was locked). It was possible for another process
- to load the indirect block into the cache, and then to have the original
- process wake up, take a block of the LRU list, and insert the
- block into the hash table again. Voila double insertion, and the previous
- incarnation of the block was lost. In the case of the indirect
- block the machine crashes when the second instance of the block
- gets deleted because there is no longer an entry in the hash table;
- it was removed when the first instance of the block was removed.
-
- This bug might also explain the fragment bug because the block cache
- is used when growing a fragment. However, I am not positive of this.
- UpgradeFragment fetches the block containing the previous incarnation
- of the fragment. It then changes the disk address of the fragment
- and unlocks the cache block. If a double insert happened then
- the first incarnation of the fragment might either get lost,
- or it might linger around and do damage (not sure about this).
- Or, perhaps some other block gets doubly inserted and wreaks havoc.
- At any rate, the origninal
- mousetrap is still in, so if this doesn't fix it I may catch
- a block being written out to a place it doesn't belong.
-
-
-
-
- 788.
- Date: Thu, 14 Dec 89 17:33:29 PST
- From: tve (Thorsten von Eicken)
- Subject: is realloc man page correct?
-
- The man page says that realloc is compatible with old versions where
- one is allowed to realloc a block one has freed since the last call
- to malloc. I'm porting a program which uses that behaviour (sic!)
- and I get the message "Mem_Size: storage block is free". I also
- had a look into /sprite/src/lib/c/stdlib/Mem_Size.c and I don't see
- support for the compatibility.
- I think non-compatibility is this case to be ok, but please fix the
- man page is that case. Or did I miss something?
- TvE
- (sorry, I don't have an easy example for the bug)
-
-
-
-
- 789.
- Date: Thu, 14 Dec 89 18:47:33 PST
- From: tve (Thorsten von Eicken)
- Subject: profiling doesn't work on ds3100
-
- If I compile with -pg, I get an error at the final load:
- Can't open: /usr/lib/mcrt0.o1.31 (No such file or directory)
- Is that fixable? Or am I doing something wrong?
-
-
-
-
- 790.
- Date: Fri, 15 Dec 89 08:29:27 PST
- From: mendel (Mendel Rosenblum)
- Subject: New sun4c kernel still has the NEW process problem
-
-
- jaywalk% sysstat -v
- jaywalk SPRITE VERSION 1.046 (sun4c) (14 Dec 89 17:30:19)
- jaywalk% ps -a | grep NEW
- 11231 NEW -33901099:-50
- 11230 NEW 28917113:11
- e1220 NEW 0:00 /users/mgbaker/cmds/screenscript -f ...
- d1226 NEW 0:00 sort /tmp/temp725536
- 2121d NEW 0:00 xgone
- b122c NEW 0:00 la
- 120b NEW 0:00 xgone
- a1212 NEW 0:00 la
- f121e NEW 0:00 sh -c /users/mgbaker/cmds/screenscript
- a1227 NEW 0:00 sh -c echo SUMMARY `hostname` `date`
- 9120e NEW 0:00 sed -f /users/mgbaker/cmds/screenscript.sed
- c1221 NEW 0:00 sed -f /users/mgbaker/cmds/screenscript.sed
- 71238 WAIT 0:00 grep NEW
- 11232 NEW 0:00
- jaywalk%
-
-
-
-
- 791.
- Date: Fri, 15 Dec 89 08:39:35 PST
- From: Fred Douglis <douglis>
- Subject: Re: New sun4c kernel still has the NEW process problem
-
- that problem won't go away until all sun4cs are running the new sun4c
- kernel. in fact, it may not go away until the sun4c mach module
- is changed to un-hold the migrate signal the way the other kernels do --
- right now, the change handles exec-time migration but not other migration.
- since almost all migration is at exec time or is of processes that migrated
- earlier at exec time, it shouldn't be a problem, but it still has to be
- fixed. i talked to mary about this briefly -- i hesitated to put the
- change into the sun4c mach module because the sun4/4c trap code
- is radically different from the others.
-
-
-
-
- 792.
- Date: Fri, 15 Dec 89 15:30:32 PST
- From: Fred Douglis <douglis>
- Subject: new ds3100 X server hangs with certain Xdefaults
-
- with the following at the end of my .Xdefaults (loaded via xrdb),
- I can't talk to the server. If I comment it out I can start X
- windows up just fine.
-
- *Text.Translations:
- Ctrl<Key>W: delete-previous-word()\n
- Ctrl<Key>U: beginning-of-line()
- kill-to-end-of-line()\n
- Meta<Key>k: kill-selection()\n
-
-
-
-
- 793.
- Date: Fri, 15 Dec 89 17:26:58 PST
- From: Fred Douglis <douglis>
- Subject: sendmail/naming problem
-
- someone resent a note to me that bounced, with mint constantly trying
- to forward to itself:
-
- ----- Transcript of session follows -----
- >>> DATA
- <<< 554 sendall: too many hops (17 max)
- 554 <douglis@@sprite.Berkeley.EDU>... Service unavailable: invalid argument
-
- ----- Unsent message follows -----
- Received: from mint.Berkeley.EDU by sprite.Berkeley.EDU (5.59/1.29)
- id AA991307; Fri, 15 Dec 89 17:18:22 PST
- ....
- Received: by rosemary.Berkeley.EDU (4.0/SMI-4.0)
- id AA05426; Fri, 15 Dec 89 17:16:37 PST
-
- I haven't seen this before, and other mail appears to work okay.
-
-
-
- 794.
- Date: Sun, 17 Dec 89 11:54:02 PST
- From: mendel (Mendel Rosenblum)
- Subject: sun4c dies horrible death
-
- I tried to kill a process on the debug list and jaywalk went into an infinite
- loop scrambling the video. I had to power cycle to get control back.
-
-
-
-
- 795.
- Date: Sun, 17 Dec 89 15:24:14 PST
- From: mgbaker (Mary Gray Baker)
- Subject: Known sparcstation bugs with processes on debug list
-
- There are 2 known bugs about continuing processes on the debug list on
- sparcstations. They are related. In the installed new kernel, the call
- to Proc_SuspendProc is in the underflow handler for processes that have
- bad stack pointers. I've already mailed bugs about this problem. Continuing
- these processes is a very bad idea since the underflow handler can't deal
- any further with a process with a bad stack pointer. This was an attempt
- to make debugging of these processes possible, but obviously I must do this
- a little differently. The other related problem is that migrated processes
- aren't supposed to go onto the debug list, and I didn't know this before.
- If the Proc_SuspendProc gets called in the underflow handler on a migrated
- process, the machine will die in List_Remove.
-
-
-
-
- 796.
- Date: Mon, 18 Dec 89 00:42:31 PST
- From: shirriff (Ken Shirriff)
- Subject: eqn on sun3 is confused
-
- Eqn puts 3 blank lines after each line containing an equation, when
- used on a sun3. It works fine on the ds3100.
-
-
-
- 797.
- Date: Mon, 18 Dec 89 12:32:54 PST
- From: pmchen (Peter M. Chen)
- Subject: diff and cmp
-
- decstation (subversion) : diff shows them equivalent
- cmp shows them equivalent
- sun4 (anise): diff shows them DIFFERENT
- cmp shows them equivalent
-
- The files are /scratch/pmchen/db2.11.22.{a,b}. Watch out, they're big (80 MB).
-
- I am unable to kill (even -9) my diff process. I also can't ^C or ^Z
- it. Aaah! The unkillable process! :-0
-
-
-
-
- 798.
- Date: Mon, 18 Dec 89 14:49:35 PST
- From: brent (Brent Welch)
- Subject: 1.046 FsioVerifyBlockWrite broken & fixed
-
- The 1.046 kernel has a botched FsioVerifyBlockWrite routine.
- It tested ok on arson, and it has been running on oregano.
- However, the bug shows up on Sun4s, so Allspice and Anise
- had trouble running this kernel. The bug causes write attempts
- to fail because the Verify routine returns a bogus value.
- I've already fixed the code and am installing a new fsio module.
-
-
-
- 799.
- Date: Mon, 18 Dec 89 16:17:31 PST
- From: tve (Thorsten von Eicken)
- Subject: sun4 cc problem
-
- On the sun4, compiling for sun4, cc1.sparc goes into the debugger with
- MachPageFault: Bus error in user proc ....
- To duplicate: cd /cad/src/cmds/cifplot; pmake sun4.md/transforms.o
- It works fine on a sun3, compiling for sun4.
-
-
-
-
- 800.
- Date: Tue, 19 Dec 89 15:57:19 PST
- From: mgbaker (Mary Gray Baker)
- Subject: Error when running out of processes
-
- My machine ran out of processes, but the error it got first was that it
- had run out of segments. However, it died with an attempt to free something
- it thought was already free, namely the free(argString) call in DoExec
- in the execError section. I can't see why it thought this was already free.
-
-
-
-
- 801.
- Date: Tue, 19 Dec 89 17:11:06 PST
- From: mgbaker (Mary Gray Baker)
- Subject: treason not realizing it's idle
-
- When treason is idle but has X running on it, rup often fails to report it
- as idle. Although I haven't verified that this is really the cause of the
- problem, it's as if there are mouse events generated even when nobody is
- moving the mouse. This has ramifications for migration, etc.
-
-
-
-
- 802.
- Date: Thu, 21 Dec 89 10:07:30 PST
- From: Fred Douglis <douglis>
- Subject: still problems with swapping errors
-
- when the net was acting up this morning i got an "error 2 from
- fs_read or fs_pageread" and my xwatch (xbiff) process died. when i
- tried to start a new one i hit "reserved instruction in ...". it
- seems like when there's a paging error on a code segment, the kernel
- isn't smart enough to nuke the segment and try again next time. last
- time this happened i had to copy the file into a new inode to get it
- to run.
-
-
-
-
- 803.
- Date: Thu, 21 Dec 89 12:22:18 PST
- From: douglis@rosemary.Berkeley.EDU (Fred Douglis)
- Subject: timer mutex deadlock after reboot
-
- mint rebooted just fine, though it had some odd complaints while checking
- /sprite that suggest the file system is on its way to getting trashed.
- when i left, it had printed a login prompt and machines were recovering.
- by the time i got back to 477, mint was in the debugger. it printed
- on its console that it was syncing its disks but didn't have an error
- message until below that point when it said that timerMutex was deadlocked.
- the holder PC and PCB were junk. i poked around in the debugger but
- couldn't find out where it was before that point, so i rebooted again and
- am crossing my fingers. any ideas why the PC/PCB wouldn't be right? that's
- in all kernels, not just special ones, correct?
-
-
-
-
- 804.
- Date: Fri, 22 Dec 89 08:32:18 PST
- From: brent (Brent Welch)
- Subject: sun4 (anise) X11R3 xinit dies
-
- One problem with X11R3 concerns anise, the sun4/260.
- xinit goes into the debugger upon startup. There
- may be some fix, but I was not able to run X11R3
- on anise because of this.
-
-
-
- 805.
- Date: Fri, 22 Dec 89 12:42:08 PST
- From: mendel (Mendel Rosenblum)
- Subject: /X/cmds.sun4/Xsprite dies frequently
-
- /X/cmds.sun4/Xsprite dies with much greater frequency (four times in the
- last couple of hours verse once a day) when the kernel grows over
- 6 megabytes. This might suggest that there a bug in the sun4c VM operating
- with low numbers of pmegs and/or free memory pages. Don't bother to try
- to debug the Xsprite because you will get a watchdog reset everytime.
-
-
-
-
- 806.
- Date: Fri, 22 Dec 89 13:01:22 PST
- From: mendel (Mendel Rosenblum)
- Subject: minor bug in Mx
-
- If you select a control-L and insert it into a Mx search window you get
- a small black rectangle rather than something representing a control-L.
- The search works (it finds the control-L's and not small black rectangles).
- Some control characters (such as control-G and control-F) come out as
- spaces in the search window.
-
-
-
-
-
- 807.
- Date: Fri, 22 Dec 89 17:21:38 PST
- From: pmchen (Peter M. Chen)
- Subject: floating point on sun3
-
- I've gotten some results that say (double) 49 / (double) 5030 * 1000.0 is 0.00.
- This only happens on the sun3, decstations and sun4s give the correct answer.
-
- I couldn't duplicate this in a simpler program, but you can see this by
- running ~pmchen/tmp/mult/t1 as me from ~pmchen/tmp/mult. The
- output file is in mult.out. Look for the line that says I/O's per second.
-
- The source is ~pmchen/raid/mult/mult.c
-
-
-
-
- 808.
- Date: Sun, 31 Dec 89 11:03:18 PST
- From: mendel (Mendel Rosenblum)
- Subject: Xmfb for sparcstation bug
-
- The X11R3 Xmfb seems to have trouble rendering small stipple-filled rectangles.
- This is why the racing stripes of Sx toolkit and broken on jaywalk and
- the other black and white sparcstations. I try to debug it but the
- object files don't seems to match the source.
-
-
-
-
- 809.
- Date: Sun, 31 Dec 89 15:27:45 PST
- From: tve (Thorsten von Eicken)
- Subject: cc dies on sun3 and sun4: can't compile!
-
- try a pmake in /X11R3/src/cmds/xgraph, when it compiles xgraph.o
- cc1 either goes into debug or the next phase complains forever that
- /tmp/cc079707.s:6841:End-of-File not at end of a line
-
-
-
-
- 810.
- Date: Sun, 31 Dec 89 15:48:57 PST
- From: mgbaker (Mary Gray Baker)
- Subject: X11R3 color database still in trouble
-
- Now my window with black background and red foreground comes out completely
- red rather than completely black. I agree this is more colorful, but it's
- equally impossible to use. Also, my light blue background has turned itself
- to purple.
-
-
-
-
- 811.
- Date: Sun, 31 Dec 89 23:44:17 PST
- From: tve (Thorsten von Eicken)
- Subject: msgs doesn't seem to get updated
-
- At least there are more recent messages on ernie.
-
-
-
-